Literature DB >> 20495556

A paired-end sequencing strategy to map the complex landscape of transcription initiation.

Ting Ni¹, David L Corcoran, Elizabeth A Rach, Shen Song, Eric P Spana, Yuan Gao, Uwe Ohler, Jun Zhu.

Abstract

Recent studies using high-throughput sequencing protocols have uncovered the complexity of mammalian transcription by RNA polymerase II, helping to define several initiation patterns in which transcription start sites (TSSs) cluster in both narrow and broad genomic windows. Here we describe a paired-end sequencing strategy, which enables more robust mapping and characterization of capped transcripts. We used this strategy to explore the transcription initiation landscape in the Drosophila melanogaster embryo. Extending the previous findings in mammals, we found that fly promoters exhibited distinct initiation patterns, which were linked to specific promoter sequence motifs. Furthermore, we identified many 5' capped transcripts originating from coding exons; our analyses support that they are unlikely the result of alternative TSSs, but rather the product of post-transcriptional modifications. We demonstrated paired-end TSS analysis to be a powerful method to uncover the transcriptional complexity of eukaryotic genomes.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Substances：
Drosophila Proteins
RNA

Year: 2010 PMID： 20495556 PMCID： PMC3197272 DOI： 10.1038/nmeth.1464

Source DB: PubMed Journal: Nat Methods ISSN： 1548-7091 Impact factor: 28.547

INTRODUCTION

Transcription by RNA Polymerase II (Pol II) is a critical step in eukaryotic gene expression. To initiate and modulate transcription, factors interact with chromatin and DNA sequence features in regulatory regions. Central to this process is the core promoter region of approximately 100 nucleotides surrounding the transcription start site (TSS) of a gene. Within this region, factors of the basal transcription machinery interact directly with DNA sequence motifs to ensure the proper recruitment of Pol II. Contrary to the simple picture in many textbooks, which often present the basal machinery as invariable and core promoters that generally share the same motifs, many recent studies have demonstrated the diversity in both basal transcription factor complexes and the sequence features to which they bind1, 2. We are still only now beginning to truly understand the diversity at the transcription initiation level, and how it provides for additional regulatory control of gene expression3-6. Methods to systematically sequence 5′ complete transcripts have provided the breakthrough for genome-wide identification of TSSs7-9. In particular, the capped analysis of gene expression (CAGE) protocol has been used to generate comprehensive mammalian libraries of short sequence tags, which have led to the identification of distinct transcription initiation patterns10-12. In some promoters, transcription initiates from the same exact location, while other transcripts initiate more uniformly across wider genomic windows. Different sequence features have been found to be associated with these patterns, such as an overrepresentation of the canonical TATA box sequence motif in ‘single-peak’ promoters, and CpG islands overlapping ‘broad range’ promoters11. The majority of studies using CAGE protocols have been focused on mouse and human. Thus far there has not been an attempt to investigate, on a similar scale, whether different initiation patterns can also be found in other animals, such as the fruit fly Drosophila melanogaster, and whether these are associated with distinct sequence features. The CAGE protocol has recently been extended to deepCAGE13 which involves a concatemerization step of reads, and the final library has to be sequenced by a platform (such as 454 pyrosequencing) that can generate sufficiently long reads. Their number is on a smaller scale than what may be achieved on other sequencing platforms. In addition, deepCAGE produces a single, typically 20-nt-long, sequence tag from the most 5′ end of the transcript, which may be too short to guarantee a unique and correct alignment to the genome, especially in the presence of sequencing errors. Such challenges can be addressed by longer reads or paired-end reads. The latter strategy is expected to be more advantageous because it can provide additional information on the local transcript structure. In order to thoroughly characterize the landscape of transcription initiation in eukaryotes we developed the Paired-End Analysis of TSSs (PEAT) strategy, by which each TSS tag (20 nt sequence from the most 5′ end of the transcript) is paired with a 20 nt downstream tag from the same gene. We applied PEAT to analyze capped transcripts of D. melanogaster mixed-stage embryos. Our results uncovered that Drosophila, like mammals, has multiple initiation patterns, each of which is associated with a distinct set of sequence motifs. Furthermore, we found that ~25% of 5′ capped reads align to the coding region of the Drosophila genome. Extending the previous findings in mammals11, 14, we provide strong evidence that these transcripts result from posttranscriptional modification rather than de novo transcription from the coding region. Together, these results demonstrate that PEAT is an improved strategy to map and characterize the landscape of transcription initiation in higher eukaryotes.

RESULTS

A paired-end strategy for deep sequencing of capped RNA

In the PEAT protocol capped transcripts are selectively ligated to a 5′ linker sequence, which contain a MmeI site, using an oligo-capping strategy. Reverse transcription is then carried out with a random hexamer tailed with a second MmeI site. After low-cycle PCR, cDNA products are circularized by bridge ligation followed by exonuclease digestion. The resulting DNA circles are subsequently amplified by rolling circle amplification15, and digested with Mme I to release paired tags, each of which contain a TSS tag and a downstream 3′ tag. After ligation with sequencing adaptors, the final PEAT library is sequenced by a Illumina Genome Analyzer with paired-end capability (Fig. 1a). Compared to conventional single-end TSS mapping strategies7, 16, the PEAT approach improves the alignment yield and accuracy and provides additional information on local transcript structure (such as linking 5′ TSS tags to known genes).

Figure 1

Paired-End Analysis of Transcriptional start sites (PEAT)

(a) Schematic outline of the PEAT strategy. The RNA fragment is shown as an arrowed line (red), the two Mme I sites induced at the oligo-capping and reverse transcription (RT) steps are shown in green and purple, respectively. (b) Mapping efficiency of the reads that have built-in linker sequences, combined from two technical replicates. (c) The distribution of uniquely mapped 5′ and 3′ reads relative to known TSSs and other genomic regions. (d) Comparison between PEAT and microarray expression data. 10,101 genes were plotted that had at least 1 mapped read-pair and were included in the microarray data. For the array data, expression level is the mean of simple background subtraction values across 3 replicates from mixed stage 0-11 D. melanogaster embryos. To estimate the expression level using paired-end sequencing data, we used the counts of 3′ tags that map to a transcribed region. Correlation coefficient was determined by Pearson correlation.

We employed the PEAT strategy to monitor global TSS usage in mixed stage embryos (0-24h) of D. melanogaster. We obtained 17.5 million raw paired-reads from two technical replicates. For approximately 90% of the paired-reads, both the TSS and 3′ reads were distinguishable by their built-in linker sequences (Table 1). Of those paired-reads, 76% were mapped to a unique location in the fly genome. An additional 10% of pairs mapped to multiple genomic locations (Fig. 1b), possibly due to transposable elements or other regions with low sequence complexity (data not shown). The majority of 5′ reads were mapped to either a known TSS or its surrounding regions, confirming that our approach captured the very 5′ end of capped transcripts (Fig. 1c, Supplementary Fig. 1). The median distance between the 5′ and 3′ reads at the transcript level was 279nt (Supplementary Fig. 2) and the 3′ reads are mostly mapped to coding regions of annotated genes, indicating the success of the paired-end library construction.

Table 1

Summary of PEAT generated data

	Replicate 1	Replicate 2	Combined
Number of Read-Pairs with Identifiable LinkerSequences	8,258,735	7,470,183	15,728,918
Read-Pairs Mapped to a Unique Genomic Location	6,246,759	5,653,860	11,900,619
Read-Pairs Mapped to Multiple Genomic Locations	862,748	782,828	1,645,576
Non-Redundant Read-Pairs	4,103,558	3,752,136	7,062,714
Non-Redundant Read-Pairs Mapped to a UniqueGenomic Location	1,688,228	1,569,274	2,716,981
Genes Represented by at Least 1 Read-Pair	11,111	11,073	11,418
Genes With An Identified Read Cluster Consistingof More Than 10 5′Reads	--	--	8,577
Genes With An Identified Read Cluster Consistingof More Than 50 5′Reads	--	--	5,563
Genes With An Identified Read Cluster Consistingof More Than 100 5′Reads	--	--	4,007

On average there were 256 tag pairs per gene (Supplementary Fig. 3), demonstrating deep coverage of the genome. 81.5% of genes currently annotated by FlyBase (v5.14) were represented by at least one read-pair, consistent with the notion that eukaryotic genomes are broadly transcribed. Taken together, the mapping yield was considerably higher than that of deepCAGE13. The fraction of aligned tags and coverage of the genome were also dramatically improved from a previous CAGE study of D. melanogaster10 (Supplementary Table 1, Supplementary Fig. 4). The two technical replicates were highly correlated (R = 0.98, Supplementary Fig. 5), indicating the reproducibility of PEAT. We next compared our results with a microarray-based expression dataset obtained from fly embryos of a similar broad developmental window (embryonic stages 0-11). With minimal normalization on both the array and sequence data, PEAT and array expression profiles were significantly correlated (R = 0.68, Fig. 1d and Supplementary Fig. 6). The result is comparable to the correlation observed between microarray and standard RNA-Seq approaches17. Therefore, the read count of the PEAT method can potentially be used to estimate transcript abundance. The paired-end strategy clearly allowed for an accurate mapping of the short reads. The addition of the 3′ reads enables ~4% of the 5′ reads to be aligned to a unique genomic location instead of multiple locations. Furthermore, the downstream tags can also correct assignment mistakes caused by sequencing errors. In fact, ~0.3% of the 5′ reads would have been wrongly aligned if the downstream tag had not been provided (Supplementary Table 2). It is expected that such improvements will become more prominent for larger genomes or when the sequencing error is relatively high Paired-reads also facilitated the direct link of novel TSSs to their respective genes. For 342,943 read pairs where the 5′ read fell more than 250 nt upstream of an annotated TSS, the corresponding 3′ read of 17% mapped to the transcribed region of the downstream gene. We successfully validated several individual cases of such distal unannotated TSSs (Supplementary Results; Supplementary Figs. 7-8 and Supplementary Table 3).

Characterization of read clusters and initiation patterns

High-throughput TSS maps have shown that mammalian promoters exhibit diverse initiation patterns, but it was an open question whether Drosophila promoters would show a similar complexity. To this end, we clustered the mapped 5′ reads (Fig. 2a), resulting in 34,664 discrete clusters covering 8,577 genes. More than 5,500 genes had at least one cluster with ≥ 50 reads, and approximately half of these clusters overlapped annotated TSSs (Fig. 2b).

Figure 2

TSS clusters and initiation patterns identified in the Drosophila embryo

(a) The approach for identifying TSS clusters. A representative example (Chr. 2: 14516000-14516600) is shown. In essence, a smoothed density estimate of 5′ TSS tags was computed (blue line). Cluster boundary was then determined as exceeding a baseline score, estimated on a genomic background (red line). TSS clusters were further condensed to the shortest distance containing 95% of the reads (dark shaded area). (b) The genomic locations of all clusters that contain ≥ 100 reads. Clusters overlapping an annotated TSS in FlyBase were classified as FlyBase TSS. For the remaining clusters, classifications were based on the mode of each given cluster and its relative location to annotated transcripts. (c) Size distribution of all clusters with ≥ 100 reads. Cluster sizes are similar to previous reports for mammals, with the majority of clusters shorter than 120nt in length. (d) Definition of initiation patterns.

To determine transcription initiation patterns, we focused on 5′ clusters with ≥ 100 reads (5,699 clusters in 4,007 genes). The cutoff was stringent to ensure high-quality assignments of initiation patterns and sequence motifs. The clusters spanned a broad size range, describing a complex multimodal distribution (Fig. 2c) suggesting distinct initiation patterns. In fact, the cluster size distribution could be approximated by two Gaussian distributions, the intersection of which fell at ~25 nt. Read clusters were thus separated into three initiation patterns, Narrow with Peak (NP), Broad with Peak (BP) and Weak Peak (WP), along the two dimensions of cluster size and read distribution within each cluster (Fig. 2d).

Initiation patterns are linked to specific core promoter

In mammals, ‘peak’ and ‘broad’ promoters tend to be associated with TATA box and CpG islands, respectively11. Since the fly genome does not contain CpG islands, it was intriguing to find that broad promoters exist in Drosophila. We therefore aimed to determine whether distinct initiation patterns were associated with core promoter motifs previously defined in Drosophila18. We extracted 200 nt sequences centered on the mode of each cluster (that is, the most frequent TSS within the cluster). Promoter sequences were aligned for each initiation pattern and the results showed that initiation preferentially occurs at an adenine, immediately preceded by the ‘TC’ di-nucleotide for all 3 initiation patterns (Fig. 3 and Table 2). The (T)CA consensus matched the minimal sequence requirements at the TSS as reported in other eukaryotes from yeast to mammals9, 11, but was only a substring of the fly initiator motif as originally reported19. Thus, even for the broad pattern, defining the reference TSS at the mode was linked to a significant presence of a minimal initiator consensus.

Figure 3

Promoter motifs associated with distinct promoter types

(a) The three initiation patterns, NP, BP and WP, are each represented by a candidate locus. The graphs show the relative percentage of 5′ reads that are mapped within a 100nt window. (b) Sequence landscape in the promoter region of each pattern. The mode location of each cluster is set as reference point ‘+1’. Sequence logos of 100-nt window are shown. (c) The core promoter motifs overrepresented for each initiation pattern. Significant motifs were identified in 200nt core promoter sequences and binned into 5nt intervals; only the 100nt region surrounding the TSS is shown as no motifs were found to be enriched outside of this window. All bins with normalized motif occurrences of 5-fold enriched or above are shown. The percent of sequences containing at least one high-stringency instance of each motif in its preferred location is listed on the left side of the heat map.

Table 2

Frequency of consensus di- and tri-nucleotides relative to the TSSs and coding region

	T⁻²C⁻¹A⁺¹	C⁻¹A⁺¹	T⁻C⁺¹A⁺²	T⁻C⁺¹
Narrow with Peak	550 (44%)	858 (68%)	13 (1%)	86 (7%)
Broad with Peak	274 (36%)	483 (64%)	13 (2%)	50 (7%)
Weak Peak	387 (19%)	973 (48%)	46 (2%)	128 (6%)
Coding Region Read Cluster	24 (2%)	108 (8%)	476 (35%)	804 (59%)

Note: The +1 position within each cluster is defined by the mode of that cluster, that is, oblivious to its location in the genome. We here show the analysis comparing coding region clusters to those near the start site of a gene. (Out of 5699 clusters, 426 clusters which fell into either intergenic or intronic regions were not included in the analysis).

We next evaluated the presence and preferred locations within different initiation patterns of eight sequence motifs reported to be present in the core promoter regions in fly2, including the TATA box and INR motif, the Motif Ten Element (MTE), the Downstream Promoter Element (DPE)20, the DNA-replication related element (DRE)4 and Motif 1,6,7 18, 21 (see Online Methods). Strikingly, the results revealed distinct associations between initiation patterns and sequence motifs. The canonical core promoter motifs with previously known location bias (TATA, INR, DPE, MTE) were highly associated with NP promoters. The DPE was enriched at its known location (+26, +30) and at an additional site (−5, −1), which has previously been observed in mammalian data22; the second location likely reflects some overlap in sequence similarity rather than functional DPE occurrences, as the importance of precise spacing has been clearly established20. Mammalian WP promoters have frequently been associated with CpG islands, a feature not present in the fly genome2. Instead, Drosophila WP promoters were strongly associated with 3 motifs (Motif1, DRE, Motif7) and showed a moderate enrichment for Motif6 (Fig. 3b-c and Supplementary Fig. 9). BP promoters, which have characteristics of both NP and WP promoters, showed a combination of the most frequent motifs found in the other types. The largest span of motif enrichment was 25 nt for the DRE motif in WP promoters, reflecting the broad initiation pattern in this class. The associations of TATA box and DRE with different classes were also supported by differential binding of factors to the genome as assayed by chromatin immunoprecipitation (Supplementary Results). We noted that the INR and Motif1 motifs share a strong conserved ‘TCA’ tri-nucleotide, that is, the minimal initiator consensus described above. Likewise, Motif6 was enriched at the same location as the TATA box and contains a minimal TAT consensus that is shared with the canonical TATA motif, suggesting that Motif6/Motif1 is an alternative to the classic TATA/INR motif pair, an observation that has only become apparent with high-resolution data generated by PEAT. Overall, these results demonstrated that the initiation patterns in fly directly reflected the presence of the specific core promoter motifs.

5′ capped read clusters in coding regions

In the initial clustering of reads, we observed that 25% of clusters were found within the coding region of an annotated gene, and cluster analysis showed that the majority of them belong to the WP class (Supplementary Table 4). Twelve candidates were selected for validation, and ten were confirmed by two independent methods, oligo-capping and cap-trapping. (Fig. 4, Supplementary Figs. 10-12 and Supplementary Table 5). Therefore, these clusters were not artifacts of the high-throughput protocol and indeed contained a 5′ cap. Supporting this notion, recent studies in mammals have also identified a high prevalence of capped transcripts originating from the coding regions11, 14.

Figure 4

A distinct sequence motif identified for internally capped transcripts

(a-b) The gene structures of the PROD and RNPS1 loci indicating exons (thick bar) and introns (thin bar) from FlyBase are shown. A thick grey bar represents the UTR region. Grey areas highlight read clusters (≥ 100 reads/cluster). Green arrows denote primer locations for RT-PCR validation. A junction primer, which spans the linker and 5′ gene specific sequence at the cluster mode, together with a downstream primer (100-200 bp distance) were used to carry out RT-PCR. For each locus, cDNAs derived from RNA samples with (+) or without (−) linker ligation were used as template. The DNA ladder (M) is shown in the left lane. Sanger sequencing results show the correct position of the mode of the called TSS cluster for (a) a capped 5′ read cluster in the middle of a coding region; and (b) an example of a capped 5′ read cluster near the end of the coding region. (c) Sequence logo of a 100 nt window around the mode location (identified as ‘+1ߣ) of all clusters containing more than 100 reads and mapping to a coding region.

Several mechanisms may underlie the biogenesis of internally capped transcripts. First, they might result from bona fide start sites in the coding region. Alternatively, these transcripts may be derived from longer precursors, for which the internal cap is introduced posttranscriptionally by a recapping mechanism14. Multiple lines of evidence from our data support the latter model. First, searching the 200 nt sequences surrounding the coding clusters revealed no overrepresentation of any of the core promoter motifs observed near bona fide TSSs (Supplementary Table 4); this was in agreement with our previous observation of a lack of promoter motifs around mammalian coding clusters23. The analysis of ChIP data6 showed frequent binding of TFs (TBP and, or TRF2) at TSS clusters but not at coding clusters (Supplementary Results). Together, our data suggested that 5′ capped coding clusters are unlikely initiated by Pol II. In addition, we found that for 69% of the coding clusters, a larger cluster (with more reads) was identified near the annotated TSS (Supplementary Fig. 13), indicating that internally capped transcripts were often accompanied by more abundant full-length transcripts. Moreover, the locations of the 5′ coding region clusters spread evenly across the exons except for a lack of clusters at the far most 3′ end of the exon (Supplementary Fig. 14), similar to what has been reported in mammals14. The mammalian study relied on TSS reads mapped across exon junctions, which are a tiny fraction of the total reads, to argue that recapping is a posttranscriptional event. Unique to the PEAT dataset, we observed that the downstream paired tags of the coding clusters were predominantly located in well-annotated exons rather than introns (~ 100-fold enrichment), indicating that internally capped transcripts were spliced or at least partially spliced. We also observed a distinct short sequence motif when aligning the sequences surrounding the mode of coding clusters. While this motif was at first glance reminiscent of the minimal initiator motif found in TSS clusters, it exhibited unique properties. ‘CA’ was the most frequent di-nucleotide at the −1 position in TSS read clusters, while the most prominent di-nucleotide at the mode location within coding region clusters was ‘TC’ (Table 2). Although the molecular mechanism of recapping remains elusive, the distinct motif implied that recapping might depend on specific sequences or protein factors.

DISCUSSION

PEAT distinguishes itself from other paired-end transcriptome sequencing strategies such as GIS-PET24. While GIS-PET also generates a TSS tag paired with a 3′ tag, the downstream tags are designed to query polyadenylation sites. Unlike PEAT, the fixed 3′ tag location could not provide local transcript structures proximal to TSS and is unable to resolve the recapping events. Extending previous observations based on ESTs25, the high-resolution initiation map generated by PEAT allowed the identification of 3 distinct initiation patterns in Drosophila. Initiation patterns were linked to the presence of specific core promoter sequence motifs, including an enrichment of the DRE motif in WP promoters. The presence of WP promoters in Drosophila is intriguing as broad promoters in mammals are enriched for CpG islands11, a genomic feature not present in the fly. Notably, CpG islands and DRE are associated with housekeeping genes in human11 and fly25 respectively, indicating functional conservation of WP promoters in diverse organisms. Moreover, ChIP data supported the notion that distinct complexes are associated with WP and NP promoters in fly (Supplementary Results, Supplementary Figs 15-16 and Supplementary Table 6). As their initiation patterns suggest, the class of BP promoters contained a combination of both the motifs seen in the other two classes. However, it is unclear whether this is a consequence of different complexes recognizing the same regulatory region, or if this occurs at different transcripts under the same condition. Our mixed-stage embryonic sample contains both maternal and zygotic transcripts, and vertebrate transcription in oocytes has recently been shown to depend on stage-specific basal transcription initiation complexes26, 27. Additionally, we provided multiple lines of evidence that internally capped transcripts are likely derived from post-transcriptional processing events in fly, as suggested by a previous mammalian study14. Recapping sites were uniformly distributed across the internal exon except at its extreme 3′ end. This coincides with the exon junction complex (EJC), which is deposited 20-24 nt upstream of splicing junctions28. Since both the early report and our study suggest that internally capped transcripts are likely to be derived from processed (or spliced) transcripts, we speculate that the depletion of the recapping site at the end of the exon may reflect the competition between the EJC and recapping machinery. Further investigations are required to elucidate the biogenesis and functional significance of this novel class of transcripts29. Lastly, this study focused on initiation sites of long polyadenylated transcripts. This explains why we did not observe promoter-associated non-coding transcripts, which have been reported in other species30, or the short transcripts associated with polymerase stalling31. An earlier study using total RNA detected a large number of transcribed fragments (transfrags) that are well upstream of known TSSs and correlate in expression with the downstream genes32. We showed that such distant TSSs are relatively rare for polyadenylated and capped transcripts, and are unlikely the initiation sites for known downstream transcripts. Although one cannot rule out that the observed differences are due to stage variation (mixed stage library vs. several 2-hr windows), it is suggestive that these transfrags are not polyadenylated or capped, or both; and that they may represent instances of a class of regulatory RNAs (for example, promoter associated long RNAs) in the fly transcriptome. Further efforts are required to profile and characterize different classes of RNA to dissect the complexity and plasticity of eukaryotic transcriptomes.

37 in total

1. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences.

Authors: G Z Hertz; G D Stormo
Journal: Bioinformatics Date: 1999 Jul-Aug Impact factor: 6.937

2. Enhancer-promoter specificity mediated by DPE or TATA core promoter motifs.

Authors: J E Butler; J T Kadonaga
Journal: Genes Dev Date: 2001-10-01 Impact factor: 11.361

3. Genome-wide analysis of mammalian promoter architecture and evolution.

Authors: Piero Carninci; Albin Sandelin; Boris Lenhard; Shintaro Katayama; Kazuro Shimokawa; Jasmina Ponjavic; Colin A M Semple; Martin S Taylor; Pär G Engström; Martin C Frith; Alistair R R Forrest; Wynand B Alkema; Sin Lam Tan; Charles Plessy; Rimantas Kodzius; Timothy Ravasi; Takeya Kasukawa; Shiro Fukuda; Mutsumi Kanamori-Katayama; Yayoi Kitazume; Hideya Kawaji; Chikatoshi Kai; Mari Nakamura; Hideaki Konno; Kenji Nakano; Salim Mottagui-Tabar; Peter Arner; Alessandra Chesi; Stefano Gustincich; Francesca Persichetti; Harukazu Suzuki; Sean M Grimmond; Christine A Wells; Valerio Orlando; Claes Wahlestedt; Edison T Liu; Matthias Harbers; Jun Kawai; Vladimir B Bajic; David A Hume; Yoshihide Hayashizaki
Journal: Nat Genet Date: 2006-04-28 Impact factor: 38.330

4. Biological function of unannotated transcription during the early development of Drosophila melanogaster.

Authors: J Robert Manak; Sujit Dike; Victor Sementchenko; Philipp Kapranov; Frederic Biemar; Jeff Long; Jill Cheng; Ian Bell; Srinka Ghosh; Antonio Piccolboni; Thomas R Gingeras
Journal: Nat Genet Date: 2006-09-03 Impact factor: 38.330

5. Transcription of histone gene cluster by differential core-promoter factors.

Authors: Yoh Isogai; Sündüz Keles; Matthias Prestel; Andreas Hochheimer; Robert Tjian
Journal: Genes Dev Date: 2007-10-31 Impact factor: 11.361

6. The transcriptional landscape of the mammalian genome.

Authors: P Carninci; T Kasukawa; S Katayama; J Gough; M C Frith; N Maeda; R Oyama; T Ravasi; B Lenhard; C Wells; R Kodzius; K Shimokawa; V B Bajic; S E Brenner; S Batalov; A R R Forrest; M Zavolan; M J Davis; L G Wilming; V Aidinis; J E Allen; A Ambesi-Impiombato; R Apweiler; R N Aturaliya; T L Bailey; M Bansal; L Baxter; K W Beisel; T Bersano; H Bono; A M Chalk; K P Chiu; V Choudhary; A Christoffels; D R Clutterbuck; M L Crowe; E Dalla; B P Dalrymple; B de Bono; G Della Gatta; D di Bernardo; T Down; P Engstrom; M Fagiolini; G Faulkner; C F Fletcher; T Fukushima; M Furuno; S Futaki; M Gariboldi; P Georgii-Hemming; T R Gingeras; T Gojobori; R E Green; S Gustincich; M Harbers; Y Hayashi; T K Hensch; N Hirokawa; D Hill; L Huminiecki; M Iacono; K Ikeo; A Iwama; T Ishikawa; M Jakt; A Kanapin; M Katoh; Y Kawasawa; J Kelso; H Kitamura; H Kitano; G Kollias; S P T Krishnan; A Kruger; S K Kummerfeld; I V Kurochkin; L F Lareau; D Lazarevic; L Lipovich; J Liu; S Liuni; S McWilliam; M Madan Babu; M Madera; L Marchionni; H Matsuda; S Matsuzawa; H Miki; F Mignone; S Miyake; K Morris; S Mottagui-Tabar; N Mulder; N Nakano; H Nakauchi; P Ng; R Nilsson; S Nishiguchi; S Nishikawa; F Nori; O Ohara; Y Okazaki; V Orlando; K C Pang; W J Pavan; G Pavesi; G Pesole; N Petrovsky; S Piazza; J Reed; J F Reid; B Z Ring; M Ringwald; B Rost; Y Ruan; S L Salzberg; A Sandelin; C Schneider; C Schönbach; K Sekiguchi; C A M Semple; S Seno; L Sessa; Y Sheng; Y Shibata; H Shimada; K Shimada; D Silva; B Sinclair; S Sperling; E Stupka; K Sugiura; R Sultana; Y Takenaka; K Taki; K Tammoja; S L Tan; S Tang; M S Taylor; J Tegner; S A Teichmann; H R Ueda; E van Nimwegen; R Verardo; C L Wei; K Yagi; H Yamanishi; E Zabarovsky; S Zhu; A Zimmer; W Hide; C Bult; S M Grimmond; R D Teasdale; E T Liu; V Brusic; J Quackenbush; C Wahlestedt; J S Mattick; D A Hume; C Kai; D Sasaki; Y Tomaru; S Fukuda; M Kanamori-Katayama; M Suzuki; J Aoki; T Arakawa; J Iida; K Imamura; M Itoh; T Kato; H Kawaji; N Kawagashira; T Kawashima; M Kojima; S Kondo; H Konno; K Nakano; N Ninomiya; T Nishio; M Okada; C Plessy; K Shibata; T Shiraki; S Suzuki; M Tagami; K Waki; A Watahiki; Y Okamura-Oho; H Suzuki; J Kawai; Y Hayashizaki
Journal: Science Date: 2005-09-02 Impact factor: 47.728

7. Mapping of transcription start sites in Saccharomyces cerevisiae using 5' SAGE.

Authors: Zhihong Zhang; Fred S Dietrich
Journal: Nucleic Acids Res Date: 2005-05-19 Impact factor: 16.971

8. Comparative genomics of Drosophila and human core promoters.

Authors: Peter C FitzGerald; David Sturgill; Andrey Shyakhtenko; Brian Oliver; Charles Vinson
Journal: Genome Biol Date: 2006 Impact factor: 13.583

9. FlyBase: integration and improvements to query tools.

Authors: Robert J Wilson; Joshua L Goodman; Victor B Strelets
Journal: Nucleic Acids Res Date: 2007-12-26 Impact factor: 16.971

Review 10. Mammalian RNA polymerase II core promoters: insights from genome-wide studies.

Authors: Albin Sandelin; Piero Carninci; Boris Lenhard; Jasmina Ponjavic; Yoshihide Hayashizaki; David A Hume
Journal: Nat Rev Genet Date: 2007-05-08 Impact factor: 53.242

89 in total

Review 1. Metazoan promoters: emerging characteristics and insights into transcriptional regulation.

Authors: Boris Lenhard; Albin Sandelin; Piero Carninci
Journal: Nat Rev Genet Date: 2012-03-06 Impact factor: 53.242

2. The TCT motif, a key component of an RNA polymerase II transcription system for the translational machinery.

Authors: Trevor J Parry; Joshua W M Theisen; Jer-Yuan Hsu; Yuan-Liang Wang; David L Corcoran; Moriah Eustice; Uwe Ohler; James T Kadonaga
Journal: Genes Dev Date: 2010-08-27 Impact factor: 11.361

3. Illuminating eukaryotic transcription start sites.

Authors: John A Stamatoyannopoulos
Journal: Nat Methods Date: 2010-07 Impact factor: 28.547

Review 4. RNA sequencing: advances, challenges and opportunities.

Authors: Fatih Ozsolak; Patrice M Milos
Journal: Nat Rev Genet Date: 2010-12-30 Impact factor: 53.242

5. TIPR: transcription initiation pattern recognition on a genome scale.

Authors: Taj Morton; Weng-Keen Wong; Molly Megraw
Journal: Bioinformatics Date: 2015-08-08 Impact factor: 6.937

Review 6. Small Genetic Circuits and MicroRNAs: Big Players in Polymerase II Transcriptional Control in Plants.

Authors: Molly Megraw; Jason S Cumbie; Maria G Ivanchenko; Sergei A Filichkin
Journal: Plant Cell Date: 2016-02-11 Impact factor: 11.277

7. Genome-wide analysis of the 5' and 3' ends of vaccinia virus early mRNAs delineates regulatory sequences of annotated and anomalous transcripts.

Authors: Zhilong Yang; Daniel P Bruno; Craig A Martens; Stephen F Porcella; Bernard Moss
Journal: J Virol Date: 2011-04-13 Impact factor: 5.103

8. Differential genome-wide profiling of tandem 3' UTRs among human breast cancer and normal cells by high-throughput sequencing.

Authors: Yonggui Fu; Yu Sun; Yuxin Li; Jie Li; Xingqiang Rao; Chong Chen; Anlong Xu
Journal: Genome Res Date: 2011-04-07 Impact factor: 9.043

9. A Genome-Wide Epstein-Barr Virus Polyadenylation Map and Its Antisense RNA to EBNA.

Authors: Vladimir Majerciak; Wenjing Yang; Jing Zheng; Jun Zhu; Zhi-Ming Zheng
Journal: J Virol Date: 2019-01-04 Impact factor: 5.103

10. The transcriptome of the baculovirus Autographa californica multiple nucleopolyhedrovirus in Trichoplusia ni cells.

Authors: Yun-Ru Chen; Silin Zhong; Zhangjun Fei; Yoshifumi Hashimoto; Jenny Z Xiang; Shiying Zhang; Gary W Blissard
Journal: J Virol Date: 2013-03-27 Impact factor: 5.103