Literature DB >> 16381981

DBTSS: DataBase of Human Transcription Start Sites, progress report 2006.

Riu Yamashita1, Yutaka Suzuki, Hiroyuki Wakaguri, Katsuki Tsuritani, Kenta Nakai, Sumio Sugano.   

Abstract

DBTSS was first constructed in 2002 based on precise, experimentally determined 5' end clones. Several major updates and additions have been made since the last report. First, the number of human clones has drastically increased, going from 190,964 to 1,359,000. Second, information about potential alternative promoters is presented because the number of 5' end clones is now sufficient to determine several promoters for one gene. Namely, we defined putative promoter groups by clustering transcription start sites (TSSs) separated by <500 bases. A total of 8308 human genes and 4276 mouse genes were found to have putative multiple promoters. Third, DBTSS provides detailed sequence comparisons of user-specified TSSs. Finally, we have added TSS information for zebrafish, malaria and schyzon (a red algae model organism). DBTSS is accessible at http://dbtss.hgc.jp.

Entities:  

Mesh:

Year:  2006        PMID: 16381981      PMCID: PMC1347491          DOI: 10.1093/nar/gkj129

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Recently, a huge amount of comprehensive expression profile data obtained by various experiments, such as microarrays, has been made available. It is a challenging problem to uncover the regulatory networks among the expressed genes from these data. Information about promoters, which contain most of the binding sites of transcription factors, is indispensable for solving this question. To define promoter regions, precise information about transcription start sites (TSSs) is also required. Such data, however, are not easily obtained because the cDNA sequence data in repository sequence databases provide no guarantees regarding the 5′ end of the sequences and because the computational prediction of promoters and TSSs still remains problematic (1). To overcome these difficulties several databases (2), including DBTSS (DataBase of Transcription Start Sites) have been constructed. DBTSS contains TSS information of genes based on specific experiments (3,4). Clones constructed by full-length cDNA methods such as oligo-capping (5,6) or CAP-trapper (7,8) are mapped on to genome sequences to determine TSSs. Each TSS is determined based on the 5′ end of the corresponding clone. DBTSS was first constructed in 2002, and has been improved by several major and minor updates. The original version (version 1) contained only human data (3). Two years later, we reported the addition of mouse TSS information (9) in version 3 (4). Here we introduce the new updates and additions since version 3, the most important one being the addition of putative alternative promoter information.

NEW FEATURES

The current version of DBTSS, version 5, includes some notable improvements since the previous report, in addition to minor updates such as modifications of the interface and the result views. One major improvement is that the amount of data for human TSSs has been significantly increased: in our report in 2002, we described 190 964 human clones which corresponded to 11 234 NCBI reference sequence cDNAs (RefSeq) (4). Because we added data from a new full-length cDNA project (10), DBTSS now contains 1 359 000 clones corresponding to 19 753 RefSeq cDNAs (Table 1). Since RefSeq cDNAs contain splicing variants as separate entries, we performed clustering of clones' information depending on their coordinate in the genome sequence; if their sequences overlapped, we regard them as the same locus. After clustering, our data correspond to 15 262 genes (Table 1). This is one of the largest collections of human 5′ end cDNA sequences.
Table 1

Statistics of DBTSS

No. of genes/no. of RefSeqNo. of promotersNo. of TSSsNo. of clones
Human15 262/19 75330 964452 1171 359 000
Mouse14 162/14 74619 023149 876364 487
Zebrafish3061/3075338215 19832 263
Malaria1527/NANA690810 236
Schyzon3635/NANA14 02922 923
To check the quality of our TSS data, we compared DBTSS with the Eukaryote Promoter Database (EPD) (2). In EPD Release 82, there are 1871 promoters collected from the literature. Among them, we could map 1767 promoter sequences to the human genome; 1639 of them mapping within 100 bases of the DBTSS TSSs, indicating that the data in DBTSS are consistent with the data obtained from ordinary methods. In the next two sections, we will discuss two other major updates: alternative promoters (APs) and promoter comparison.

ALTERNATIVE PROMOTERS

Several genes are known to have multiple promoters which could be regulated in a different manner. These promoters, labeled as APs, could be useful to maximally exploit the relatively limited number of genes in the genome (11). However, no estimation of how many genes might have alternative promotes is available to date. Since DBTSS now has enough 5′ end clones from human and mouse, we performed this estimate. This is the most important addition in version 5. Although the details of our analysis will be reported elsewhere (12), the procedure is summarized below. To determine APs, we first collected all the TSSs from the same locus. TSSs located inside a RefSeq gene exon, with the exception of the first one, were removed in order to avoid artifacts caused by truncated 5′ ends. We used several intervals to define AP clusters. The distribution of the number of putative alternative promoter containing genes shows a plateau before the interval size reaches 500 bp (12). We, therefore, clustered the clones using a 500 base interval, and defined each cluster as an promoter. We obtained 30 964 promoters, and 26 784 (86.5%) of them are within 500 bp. According to this procedure, 6954 human loci and 9886 mouse loci have only one promoter while 8308 human loci and 4276 mouse loci have two promoters or more. Figure 1A shows the three alternative promoters found in the gene encoding human A kinase anchor protein 1 (AKAP1). It is notable that DBTSS also provides comparative information between human and mouse promoters. Figure 1B shows an example of comparative promoter analysis between orthologous genes. Two promoters were identified for the mouse gene for AKAP1. From this view, the representative APs are also available for alignment. By clicking ‘Comparative View’ in ‘Promoter Comparison’ in Figure 1B, the LALIGN-based alignment view, shown in Figure 1C is obtained.
Figure 1

An example of alternative promoter view. Here we show AKAP1 (NM_003488) as an example. (A) The putative promoter clusters are given using different colors; therefore, there are three putative promoters in human AKAP1. We observed several patterns of first exon in promoter type 1, so we clustered them and show them as ‘First exon variant type A–F’. (B) Comparative analysis between human and mouse alternative promoters. There are two putative promoters in mouse. The best match between two promoters is available in ‘Promoter Comparison’. (C) Clicking the ‘Comparative View’, the user can obtain the alignment between these promoters.

COMPARATIVE PROMOTER ANALYSIS

In the previous section, we showed an example of alternative promoter comparison between human and mouse. Before version 5 of DBTSS, these were precomputed, and the user could only obtain alignments between orthologous human and mouse genes. Despite being a useful idea, this sometimes failed to answer the user's need for alignments of arbitrary promoter pairs, for instance, promoters of paralogous genes. We therefore implement a dynamic viewer allowing the alignment of any two TSSs present in DBTSS. Such analyses are necessary to understand how transcriptional regulatory elements were conserved or diverged during gene and exon duplication. For example, in Figure 2A, the clones TST01431 of protamine 1 (PRM1: NM_002761) and TST00906 of protamine 2 (PRM2: NM_002762) are selected for alignment. Both genes are expressed in testis and are paralogous to each other. PRM1 is found in nearly all mammals while PRM2 is observed in relatively few mammals including human and mouse (13). In human, both genes are on chromosome 16, separated by ∼5 kb (14). The obtained alignment and the determined conserved regions are shown in Figure 2B. In this case, the blocks ‘0’ and ‘7’ are highly conserved. The details of the alignment of both TSS regions are also available, as shown in Figure 2C. Especially, it is noteworthy that the putative TATA-box is inside block ‘7’ for the PRM1 promoter and outside of it for the PRM2 promoters (15).
Figure 2

An example of comparative analysis with any pair of TSSs. We show paralogous genes, protamine 1 (PRM1: NM_002761) and protamine 2 (PRM2: NM_002762), as an example. (A) By inputting the IDs of clones of PRM1 (TST01431) and PRM2 (TST00906) representative TSSs, users can obtain the results (B and C). (B) LALIGN analysis between two sequences. Note: smaller numbers indicate more highly conserved blocks. In this figure, the most conserved region between a pair is block 0; however, it includes Alu repeats. (C) The detail of the alignment of block 7. The putative TATA-boxes are marked with boxes.

FUTURE PERSPECTIVE

As shown in Table 1, we have added data from 32 263 zebrafish (Danio rerio) (16), 10 236 malaria (Plasmodium falciparum) (17) and 22 923 schyzon (Cyanidioscyzon merolae) (18) 5′ end clones. These correspond to 3061 zebrafish, 1527 malaria and 3635 schyzon genes. We will continue to expand DBTSS by adding TSS information for other species, such as macaque, when the relevant data become publicly available. Such data will give us a deeper insight into how the transcriptional regulatory networks have been shaped into their current form in humans, in terms of the molecular evolution of the promoters.
  18 in total

1.  DBTSS: DataBase of human Transcriptional Start Sites and full-length cDNAs.

Authors:  Yutaka Suzuki; Riu Yamashita; Kenta Nakai; Sumio Sugano
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

2.  Complete sequencing and characterization of 21,243 full-length human cDNAs.

Authors:  Toshio Ota; Yutaka Suzuki; Tetsuo Nishikawa; Tetsuji Otsuki; Tomoyasu Sugiyama; Ryotaro Irie; Ai Wakamatsu; Koji Hayashi; Hiroyuki Sato; Keiichi Nagai; Kouichi Kimura; Hiroshi Makita; Mitsuo Sekine; Masaya Obayashi; Tatsunari Nishi; Toshikazu Shibahara; Toshihiro Tanaka; Shizuko Ishii; Jun-ichi Yamamoto; Kaoru Saito; Yuri Kawai; Yuko Isono; Yoshitaka Nakamura; Kenji Nagahari; Katsuhiko Murakami; Tomohiro Yasuda; Takao Iwayanagi; Masako Wagatsuma; Akiko Shiratori; Hiroaki Sudo; Takehiko Hosoiri; Yoshiko Kaku; Hiroyo Kodaira; Hiroshi Kondo; Masanori Sugawara; Makiko Takahashi; Katsuhiro Kanda; Takahide Yokoi; Takako Furuya; Emiko Kikkawa; Yuhi Omura; Kumi Abe; Kumiko Kamihara; Naoko Katsuta; Kazuomi Sato; Machiko Tanikawa; Makoto Yamazaki; Ken Ninomiya; Tadashi Ishibashi; Hiromichi Yamashita; Katsuji Murakawa; Kiyoshi Fujimori; Hiroyuki Tanai; Manabu Kimata; Motoji Watanabe; Susumu Hiraoka; Yoshiyuki Chiba; Shinichi Ishida; Yukio Ono; Sumiyo Takiguchi; Susumu Watanabe; Makoto Yosida; Tomoko Hotuta; Junko Kusano; Keiichi Kanehori; Asako Takahashi-Fujii; Hiroto Hara; Tomo-o Tanase; Yoshiko Nomura; Sakae Togiya; Fukuyo Komai; Reiko Hara; Kazuha Takeuchi; Miho Arita; Nobuyuki Imose; Kaoru Musashino; Hisatsugu Yuuki; Atsushi Oshima; Naokazu Sasaki; Satoshi Aotsuka; Yoko Yoshikawa; Hiroshi Matsunawa; Tatsuo Ichihara; Namiko Shiohata; Sanae Sano; Shogo Moriya; Hiroko Momiyama; Noriko Satoh; Sachiko Takami; Yuko Terashima; Osamu Suzuki; Satoshi Nakagawa; Akihiro Senoh; Hiroshi Mizoguchi; Yoshihiro Goto; Fumio Shimizu; Hirokazu Wakebe; Haretsugu Hishigaki; Takeshi Watanabe; Akio Sugiyama; Makoto Takemoto; Bunsei Kawakami; Masaaki Yamazaki; Koji Watanabe; Ayako Kumagai; Shoko Itakura; Yasuhito Fukuzumi; Yoshifumi Fujimori; Megumi Komiyama; Hiroyuki Tashiro; Akira Tanigami; Tsutomu Fujiwara; Toshihide Ono; Katsue Yamada; Yuka Fujii; Kouichi Ozaki; Maasa Hirao; Yoshihiro Ohmori; Ayako Kawabata; Takeshi Hikiji; Naoko Kobatake; Hiromi Inagaki; Yasuko Ikema; Sachiko Okamoto; Rie Okitani; Takuma Kawakami; Saori Noguchi; Tomoko Itoh; Keiko Shigeta; Tadashi Senba; Kyoka Matsumura; Yoshie Nakajima; Takae Mizuno; Misato Morinaga; Masahide Sasaki; Takushi Togashi; Masaaki Oyama; Hiroko Hata; Manabu Watanabe; Takami Komatsu; Junko Mizushima-Sugano; Tadashi Satoh; Yuko Shirai; Yukiko Takahashi; Kiyomi Nakagawa; Koji Okumura; Takahiro Nagase; Nobuo Nomura; Hisashi Kikuchi; Yasuhiko Masuho; Riu Yamashita; Kenta Nakai; Tetsushi Yada; Yusuke Nakamura; Osamu Ohara; Takao Isogai; Sumio Sugano
Journal:  Nat Genet       Date:  2003-12-21       Impact factor: 38.330

Review 3.  Complex controls: the role of alternative promoters in mammalian genomes.

Authors:  Josette-Renée Landry; Dixie L Mager; Brian T Wilhelm
Journal:  Trends Genet       Date:  2003-11       Impact factor: 11.639

4.  Genome sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae 10D.

Authors:  Motomichi Matsuzaki; Osami Misumi; Tadasu Shin-I; Shinichiro Maruyama; Manabu Takahara; Shin-Ya Miyagishima; Toshiyuki Mori; Keiji Nishida; Fumi Yagisawa; Keishin Nishida; Yamato Yoshida; Yoshiki Nishimura; Shunsuke Nakao; Tamaki Kobayashi; Yu Momoyama; Tetsuya Higashiyama; Ayumi Minoda; Masako Sano; Hisayo Nomoto; Kazuko Oishi; Hiroko Hayashi; Fumiko Ohta; Satoko Nishizaka; Shinobu Haga; Sachiko Miura; Tomomi Morishita; Yukihiro Kabeya; Kimihiro Terasawa; Yutaka Suzuki; Yasuyuki Ishii; Shuichi Asakawa; Hiroyoshi Takano; Niji Ohta; Haruko Kuroiwa; Kan Tanaka; Nobuyoshi Shimizu; Sumio Sugano; Naoki Sato; Hisayoshi Nozaki; Naotake Ogasawara; Yuji Kohara; Tsuneyoshi Kuroiwa
Journal:  Nature       Date:  2004-04-08       Impact factor: 49.962

5.  Parallelization of a local similarity algorithm.

Authors:  X Huang; W Miller; S Schwartz; R C Hardison
Journal:  Comput Appl Biosci       Date:  1992-04

6.  Oligo-capping: a simple method to replace the cap structure of eukaryotic mRNAs with oligoribonucleotides.

Authors:  K Maruyama; S Sugano
Journal:  Gene       Date:  1994-01-28       Impact factor: 3.688

7.  Genomic sequences of human protamines whose genes, PRM1 and PRM2, are clustered.

Authors:  L Domenjoud; G Nussbaum; I M Adham; G Greeske; W Engel
Journal:  Genomics       Date:  1990-09       Impact factor: 5.736

8.  Full-malaria 2004: an enlarged database for comparative studies of full-length cDNAs of malaria parasites, Plasmodium species.

Authors:  Junichi Watanabe; Yutaka Suzuki; Masahide Sasaki; Sumio Sugano
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

9.  The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC).

Authors:  Daniela S Gerhard; Lukas Wagner; Elise A Feingold; Carolyn M Shenmen; Lynette H Grouse; Greg Schuler; Steven L Klein; Susan Old; Rebekah Rasooly; Peter Good; Mark Guyer; Allison M Peck; Jeffery G Derge; David Lipman; Francis S Collins; Wonhee Jang; Steven Sherry; Mike Feolo; Leonie Misquitta; Eduardo Lee; Kirill Rotmistrovsky; Susan F Greenhut; Carl F Schaefer; Kenneth Buetow; Tom I Bonner; David Haussler; Jim Kent; Mark Kiekhaus; Terry Furey; Michael Brent; Christa Prange; Kirsten Schreiber; Nicole Shapiro; Narayan K Bhat; Ralph F Hopkins; Florence Hsie; Tom Driscoll; M Bento Soares; Tom L Casavant; Todd E Scheetz; Michael J Brown-stein; Ted B Usdin; Shiraki Toshiyuki; Piero Carninci; Yulan Piao; Dawood B Dudekula; Minoru S H Ko; Koichi Kawakami; Yutaka Suzuki; Sumio Sugano; C E Gruber; M R Smith; Blake Simmons; Troy Moore; Richard Waterman; Stephen L Johnson; Yijun Ruan; Chia Lin Wei; S Mathavan; Preethi H Gunaratne; Jiaqian Wu; Angela M Garcia; Stephen W Hulyk; Edwin Fuh; Ye Yuan; Anna Sneed; Carla Kowis; Anne Hodgson; Donna M Muzny; John McPherson; Richard A Gibbs; Jessica Fahey; Erin Helton; Mark Ketteman; Anuradha Madan; Stephanie Rodrigues; Amy Sanchez; Michelle Whiting; Anup Madari; Alice C Young; Keith D Wetherby; Steven J Granite; Peggy N Kwong; Charles P Brinkley; Russell L Pearson; Gerard G Bouffard; Robert W Blakesly; Eric D Green; Mark C Dickson; Alex C Rodriguez; Jane Grimwood; Jeremy Schmutz; Richard M Myers; Yaron S N Butterfield; Malachi Griffith; Obi L Griffith; Martin I Krzywinski; Nancy Liao; Ryan Morin; Ryan Morrin; Diana Palmquist; Anca S Petrescu; Ursula Skalska; Duane E Smailus; Jeff M Stott; Angelique Schnerch; Jacqueline E Schein; Steven J M Jones; Robert A Holt; Agnes Baross; Marco A Marra; Sandra Clifton; Kathryn A Makowski; Stephanie Bosak; Joel Malek
Journal:  Genome Res       Date:  2004-10       Impact factor: 9.043

10.  Conservation of the PRM1 --> PRM2 --> TNP2 domain.

Authors:  Susan M Wykes; Stephen A Krawetz
Journal:  DNA Seq       Date:  2003-10
View more
  53 in total

1.  IFN-γ attenuates hypoxia-inducible factor (HIF) activity in intestinal epithelial cells through transcriptional repression of HIF-1β.

Authors:  Louise E Glover; Karina Irizarry; Melanie Scully; Eric L Campbell; Brittelle E Bowers; Carol M Aherne; Douglas J Kominsky; Christopher F MacManus; Sean P Colgan
Journal:  J Immunol       Date:  2011-01-03       Impact factor: 5.422

2.  HIF-dependent regulation of AKAP12 (gravin) in the control of human vascular endothelial function.

Authors:  Thomas Weissmüller; Louise E Glover; Blair Fennimore; Valerie F Curtis; Christopher F MacManus; Stefan F Ehrentraut; Eric L Campbell; Melanie Scully; Bryon D Grove; Sean P Colgan
Journal:  FASEB J       Date:  2013-09-12       Impact factor: 5.191

3.  Search for basonuclin target genes.

Authors:  Junwen Wang; Shengliang Zhang; Richard M Schultz; Hung Tseng
Journal:  Biochem Biophys Res Commun       Date:  2006-08-10       Impact factor: 3.575

4.  A code for transcription initiation in mammalian genomes.

Authors:  Martin C Frith; Eivind Valen; Anders Krogh; Yoshihide Hayashizaki; Piero Carninci; Albin Sandelin
Journal:  Genome Res       Date:  2007-11-21       Impact factor: 9.043

5.  Integrative content-driven concepts for bioinformatics "beyond the cell".

Authors:  Edgar Wingender; Torsten Crass; Jennifer D Hogan; Alexander E Kel; Olga V Kel-Margoulis; Anatolij P Potapov
Journal:  J Biosci       Date:  2007-01       Impact factor: 1.826

6.  Profiling the thermodynamic softness of adenoviral promoters.

Authors:  Chu H Choi; Zoi Rapti; Vladimir Gelev; Michele R Hacker; Boian Alexandrov; Evelyn J Park; Jae Suk Park; Nobuo Horikoshi; Augusto Smerzi; Kim Ø Rasmussen; Alan R Bishop; Anny Usheva
Journal:  Biophys J       Date:  2008-04-04       Impact factor: 4.033

7.  Transcription factor binding and modified histones in human bidirectional promoters.

Authors:  Jane M Lin; Patrick J Collins; Nathan D Trinklein; Yutao Fu; Hualin Xi; Richard M Myers; Zhiping Weng
Journal:  Genome Res       Date:  2007-06       Impact factor: 9.043

8.  Prediction of CpG-island function: CpG clustering vs. sliding-window methods.

Authors:  Michael Hackenberg; Guillermo Barturen; Pedro Carpena; Pedro L Luque-Escamilla; Christopher Previti; José L Oliver
Journal:  BMC Genomics       Date:  2010-05-26       Impact factor: 3.969

9.  Fine expression profiling of full-length transcripts using a size-unbiased cDNA library prepared with the vector-capping method.

Authors:  Mio Oshikawa; Yoshiko Sugai; Ron Usami; Kuniyo Ohtoko; Shigeru Toyama; Seishi Kato
Journal:  DNA Res       Date:  2008-05-16       Impact factor: 4.458

10.  Intrinsic promoter activities of primary DNA sequences in the human genome.

Authors:  Yuta Sakakibara; Takuma Irie; Yutaka Suzuki; Riu Yamashita; Hiroyuki Wakaguri; Akinori Kanai; Joe Chiba; Toshihisa Takagi; Junko Mizushima-Sugano; Shin-ichi Hashimoto; Kenta Nakai; Sumio Sugano
Journal:  DNA Res       Date:  2007-05-23       Impact factor: 4.458

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.