| Literature DB >> 20522451 |
Junya Yamagishi1, Hiroyuki Wakaguri, Akio Ueno, Youn-Kyoung Goo, Mohammed Tolba, Makoto Igarashi, Yoshifumi Nishikawa, Chihiro Sugimoto, Sumio Sugano, Yutaka Suzuki, Junichi Watanabe, Xuenan Xuan.
Abstract
For the last couple of years, a method that permits the collection of precise positional information of transcriptional start sites (TSSs) together with digital information of the gene-expression levels in a high-throughput manner was established. We applied this novel method, 'tss-seq', to elucidate the transcriptome of tachyzoites of the Toxoplasma gondii, which resulted in the identification of 124,000 TSSs, and they were clustered into 10,000 transcription regions (TRs) with a statistics-based analysis. The TRs and annotated ORFs were paired, resulting in the identification of 30% of the TRs and 40% of the ORFs without their counterparts, which predicted undiscovered genes and stage-specific transcriptions, respectively. The massive data for TSSs make it possible to execute the first systematic analysis of the T. gondii core promoter structure, and the information showed that T. gondii utilized an initiator-like motif for their transcription in the major and novel motif, the downstream thymidine cluster, which was similar to the Y patch observed in plants. This encyclopaedic analysis also suggested that the TATA box, and the other well-known core promoter elements were hardly utilized.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20522451 PMCID: PMC2920756 DOI: 10.1093/dnares/dsq013
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Global statistics
| Number of tags | |
|---|---|
| Total obtained tags | 6 801 945 |
| Unique obtained tags | 1 336 060 |
| Total valid tags with 0–2 mismatches | 4 082 575 |
| Uniquely mapped tags | 4 027 161 |
| Without mismatches | 2 590 719 |
| With 1 mismatch | 972 811 |
| With 2 mismatches | 463 631 |
| Redundantly mapped tags | 55 414 |
| Unique valid tags with 0–2 mismatches | 450 852 |
| Uniquely mapped tags | 440 690 |
| Redundantly mapped tags | 10 162 |
| TSSs | 124 217 |
Figure 1Statistical profile of TSSs in T. gondii. Counts of each unique tag were converted to a logarithm and plotted on a horizontal axis. The rank of the tag according to its counts was converted to a logarithm and plotted on a vertical axis; 440 609 tags consisting of 124 217 TSSs were examined.
Positional comparison between TSS cluster and gene
| Number of items | |
|---|---|
| TSS cluster | 10 508 |
| With assigned ORF | 7447 |
| Without assigned ORF | 3061 |
| ORF | 7923 |
| With assigned TSS cluster | 4363 |
| Without assigned TSS cluster | 3560 |
Figure 2Distribution of tag sequences mapped on the genome of T. gondii. The coding regions for representative tachyzoite-specific genes, SAG1, SRS2, LDH1, and ENO2, are shown by grey boxes. Those for bradyzoite-specific genes, BAG1, LDH2, and ENO1, are shown by white boxes. The length of the horizontal lines represents the number of observed tags, and the position represents the TSS.
Figure 3Core promoter structure in T. gondii. (A) Distribution of the nucleotide in the core ± 100. The core promoter area ±100 nucleotides around the TSS is shown on a horizontal axis, and the observed frequency is plotted on a vertical axis. The black line, dotted black line, grey line, and dotted grey line represent the occurrence frequency for adenine, thymine, cytosine, and guanine, respectively. (B) Distribution of motifs in the core ± 100. Their PWM and threshold applied for evaluation are described in the text. (C) Model of the core promoter structure in T. gondii. The two boxes represent the tgINRs consensus sequence and DTC. An arrowhead represents a TSS. The region where the TATA box should be is shown by a crossed-out grey box.
Frequently observed sequences at TSSs
| Sequence | Freq. | Sequence | Freq. |
|---|---|---|---|
| TCACTTTa | 88 | TCACAAAb | 26 |
| TCATTTTa | 63 | CCAGTTCa | 25 |
| CCACTTTa | 60 | ACACTTTa | 25 |
| GCATTTTa | 40 | ACATTTTa | 25 |
| TCAGTTTa | 39 | CCAGAAAb | 24 |
| TCACTTCa | 34 | CCATTTCa | 23 |
| CCAGTTTa | 33 | TCATTCTa | 22 |
| GCAGAAAb | 32 | CCACTTCa | 22 |
| TCATTTCa | 32 | TCAGTTCa | 21 |
| TCAGAAAb | 31 | ACATTTCa | 20 |
| GCATTTCa | 27 | ACAGAAAb | 20 |
| CCATTTTa | 27 | TCAATTTa | 19 |
| TCACATTa | 26 | GCAGTTTa | 19 |
| GCACTTTa | 26 | CCATTCTa | 19 |
| ACAGTTTa | 26 | GCATTCTa | 18 |
aSequences subjected to the frequency matrix tgINRT consensus.
bSequences subjected to the frequency matrix tgINRA consensus.
PFM for the tgINRT and tgINRA
| First | Second | Third | Fourth | Fifth | Sixth | Seventh | |
|---|---|---|---|---|---|---|---|
| tgINRT | |||||||
| A | 0.12 | 0.00 | 1.00 | 0.02 | 0.03 | 0.00 | 0.00 |
| T | 0.44 | 0.00 | 0.00 | 0.41 | 0.97 | 0.92 | 0.74 |
| G | 0.17 | 0.00 | 0.00 | 0.21 | 0.00 | 0.00 | 0.00 |
| C | 0.27 | 1.00 | 0.00 | 0.36 | 0.00 | 0.08 | 0.26 |
| tgINRA | |||||||
| A | 0.15 | 0.00 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 |
| T | 0.43 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| G | 0.24 | 0.00 | 0.00 | 0.80 | 0.00 | 0.00 | 0.00 |
| C | 0.18 | 1.00 | 0.00 | 0.20 | 0.00 | 0.00 | 0.00 |
Frequently observed sequences from +3 to +14
| Rank | Sequence | Freq. | Rank | Sequence | Freq. |
|---|---|---|---|---|---|
| 1 | TTTTCT | 510 | 16 | TGTTTT | 220 |
| 2 | TTTTTC | 483 | 17 | TTTCTG | 212 |
| 3 | TTTTTT | 455 | 18 | TTTTCG | 207 |
| 4 | TTTCTT | 425 | 19 | TCTTTC | 204 |
| 5 | TTCTTT | 360 | 20 | TTTTTG | 201 |
| 6 | TCTTTT | 324 | 21 | TTTTGT | 197 |
| 7 | TTTCTC | 320 | 22 | TTGTTT | 183 |
| 8 | TTTTCC | 319 | 23 | TTCCTT | 183 |
| 9 | CTTTTC | 290 | 24 | TTTGTT | 174 |
| 10 | TTCTTC | 282 | 25 | TCTCTT | 169 |
| 11 | TTTCCT | 256 | 26 | CTTTCT | 168 |
| 12 | GTTTTT | 240 | 27 | TTTCGT | 164 |
| 13 | TTCTCT | 236 | 28 | ATTTTT | 162 |
| 14 | CTTTTT | 232 | 29 | GTTTTC | 162 |
| 15 | TCTTCT | 225 | 30 | TTTTCA | 161 |
| 44 | AAAAAA | 123 | 71 | AAAAAG | 98 |
| 61 | AAGAAA | 103 | 79 | AAAGAA | 93 |
| 67 | AAAAGA | 100 | 89 | GAAAAA | 87 |
| 69 | AGAAAA | 99 | 109 | AAAACA | 80 |
A-rich hexamers are selected and shown under the dotted line.
PFM for the DTC
| First | Second | Third | Fourth | Fifth | Sixth | |
|---|---|---|---|---|---|---|
| A | 0.02 | 0.00 | 0.00 | 0.00 | 0.00 | 0.02 |
| T | 0.84 | 0.85 | 0.84 | 0.75 | 0.68 | 0.63 |
| G | 0.05 | 0.03 | 0.02 | 0.02 | 0.05 | 0.08 |
| C | 0.09 | 0.12 | 0.14 | 0.22 | 0.27 | 0.27 |
The Mann–Whitney test for core promoter motifs and transcription activities
| Object 1 | Object 2 | |||
|---|---|---|---|---|
| Motif | Median | Motif | Median | |
| tgINRT | 32 | INR negative | 22 | 0* |
| tgINRA | 27 | INR negative | 22 | 8.02 × 10−8 |
| tgINRT | 32 | tgINRA | 27 | 3.11 × 10−4 |
| DTC at +3 | 29 | DTC negative at +3 | 24 | 2.42 × 10−5 |
| DTC at +9 | 27 | DTC negative at +9 | 24 | 9.78 × 10−4 |
| DTC at +19 | 29 | DTC negative at +19 | 24 | 8.50 × 10−3 |
| DTC at +3 | 29 | DTC at +9 | 27 | 8.22 × 10−1 |
| DTC at +3 | 29 | DTC at +19 | 29 | 9.83 × 10−1 |
| DTC at +9 | 27 | DTC at +19 | 29 | 8.72 × 10−1 |
*z-value was enough large (11.083) to calculate the exact P-value.