Literature DB >> 16381920

Tetrahymena Genome Database (TGD): a new genomic resource for Tetrahymena thermophila research.

Nicholas A Stover1, Cynthia J Krieger, Gail Binkley, Qing Dong, Dianna G Fisk, Robert Nash, Anand Sethuraman, Shuai Weng, J Michael Cherry.   

Abstract

We have developed a web-based resource (available at www.ciliate.org) for researchers studying the model ciliate organism Tetrahymena thermophila. Employing the underlying database structure and programming of the Saccharomyces Genome Database, the Tetrahymena Genome Database (TGD) integrates the wealth of knowledge generated by the Tetrahymena research community about genome structure, genes and gene products with the newly sequenced macronuclear genome determined by The Institute for Genomic Research (TIGR). TGD provides information curated from the literature about each published gene, including a standardized gene name, a link to the genomic locus in our graphical genome browser, gene product annotations utilizing the Gene Ontology, links to published literature about the gene and more. TGD also displays automatic annotations generated for the gene models predicted by TIGR. A variety of tools are available at TGD for searching the Tetrahymena genome, its literature and information about members of the research community.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 16381920      PMCID: PMC1347417          DOI: 10.1093/nar/gkj054

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Research on the ciliated protozoan Tetrahymena thermophila (here referred to simply as Tetrahymena) has been providing remarkable insights into basic biological principles for over 40 years. Among the most notable discoveries made have been Type-I self-splicing introns, the molecular motor dynein, a histone modification ‘code’ responsible for different chromatin states and the riboprotein complex telomerase (1). More recently, studies of the RNA interference pathway and tubulin modification in Tetrahymena cells continue to reveal intriguing biological processes. These studies all owe their success in part to the fascinating cell biology and life cycle of Tetrahymena. Tetrahymena are large cells (∼30 µm × 50 µm), each covered with hundreds of cilia and containing two nuclei: the huge, polyploid macronucleus, which serves as the somatic nucleus for the cell, and the transcriptionally silent micronucleus, which generates gametes in times of stress. Tetrahymena's extreme amplification of the macronuclear chromosome containing the rDNA (∼9000 haploid copies per cell) has been a boon to the study of chromosome features, such as telomeres and DNA replication origins, whereas the dense collection of cilia and basal bodies at the cell surface has made it a premier system for dissecting the components of these structures. The Tetrahymena Genome Database (TGD; ) was established by the Tetrahymena Macronuclear Genome Project as a manually curated and updated resource linking past and future Tetrahymena research to the newly available genome sequence. By working in close collaboration with the staff of the Saccharomyces Genome Database (SGD), we have developed TGD by making only minor modifications to the software and database environment of the SGD project (2). In addition to providing the bioinformatic tools and annotation efforts described below, TGD serves as a community information center, with lists of upcoming meetings, a primer on ciliate biology and contact information for members of the Tetrahymena research community.

GENE CURATION AND DISPLAY

To date, over 250 actively studied genes in the Tetrahymena macronuclear genome have been named and described by authors in the research community. Each of these published genes is presented on a TGD web page containing basic information we have collected about that gene, plus links to other TGD pages with more information and links out to other resources. For each published gene we have (i) identified a standardized gene name that conforms to the published Tetrahymena nomenclature guidelines (3), plus any aliases used for that gene in the literature; (ii) collected relevant literature citations and provided links to these papers' entries at Pubmed; (iii) written short, free-text descriptions summarizing the knowledge about the gene; and (iv) provided a link to one or more GenBank entries for the gene, to allow users to access the sequence of the gene described in the literature. We are currently annotating the molecular function, biological process and cellular localization of each gene product for which data are available, using terms from the Gene Ontology (GO) (4). The GO is a controlled vocabulary used by many model organism databases to describe these gene features in a species-independent fashion, allowing users to easily search, sort and compare genes in diverse organisms. An integral part of performing these annotations is updating the GO with terms necessary to describe the biology of ciliates. TGD has already contributed a number of GO terms and definitions, including new terms for processes related to the RNA interference pathway, nuclear dimorphism, translation termination and other important areas of ciliate biology.

SEQUENCE AND GENE MODELS

The Institute for Genomic Research (TIGR) has sequenced the Tetrahymena macronuclear genome, in an effort led by Jonathan Eisen. The ∼106 megabase haploid macronuclear genome is predicted to be arranged into 250–300 chromosomes; the sequence closure effort at TIGR is ongoing, with over 40% of the genomic sequence assembled into complete chromosomes bounded by telomeres at both ends. TIGR has submitted the genome sequences to the Whole Genome Shotgun depository at National Center for Biotechnology Information (NCBI) under accession number AAGF00000000. TIGR has released a preliminary set of gene model predictions using their genome scaffolds. The number of protein-coding genes predicted in their analysis is 27 400. The gene models currently available have not been manually reviewed and, reflecting their preliminary nature, may contain inaccuracies in their coding sequence boundaries when compared to cDNA sequences. Nonetheless, the preliminary gene models have proven to be very useful to the Tetrahymena research community, and studies identifying new genes and gene families based on these gene models are already being published (5–10). In order to accommodate the ongoing gene model refinements at TIGR, we have adopted a two-tiered gene page, an example of which is shown in Figures 1 and 2. The upper section of the page (Figure 1) displays the information we have curated from the literature about a gene, plus a link to its published sequence at GenBank. The lower section of the page (Figure 2) displays the automatic annotations generated by TIGR and TGD for the corresponding gene model. As TIGR releases updates, information curated for particular genes can be linked to alternate gene models as appropriate. If there is no published information for a gene corresponding to a given gene model, the gene page will only display information about the gene model. We have found this to be an effective method for displaying the hypothetical gene models and automatic, non-reviewed annotations of a newly sequenced genome, while simultaneously presenting and maintaining the integrity of information published about a slightly different coding sequence.
Figure 1

The upper section of the gene page for the Tetrahymena dynein heavy chain DYH1 presents information about the published gene and its product, including its standard name and aliases, a short description, a graphical display of the gene, GO annotations and links to its literature and gene sequence at GenBank.

Figure 2

The lower section of the gene page for DYH1 presents computational annotation of gene model 3.m01901, the preliminary gene model corresponding to the DYH1 gene. The information displayed includes a short description of the 3.m01901 gene product provided by TIGR, a link to its gene model page at TIGR, its top three BLASTP hits against the UniRef90 protein database, a graphical display of the gene model, protein physical properties, protein domains predicted using InterProScan and automatic GO annotations based on its predicted protein domains.

TOOLS

We have created a server and an interface at our website for BLAST (11) and BLAT (12) searches against a number of relevant sequence datasets. Searching against the Tetrahymena macronuclear genome scaffold sequences produces a graphical view of the target sequence region in the genome, in addition to the sequence alignment for this region. This allows the user to see annotations of predicted gene models found in the region, directly from the BLAST/BLAT results page. The graphic is hyperlinked to GBrowse, a graphical genome browser utility available from the Generic Model Organism Database (GMOD) Construction Set (13), which TGD uses to display Tetrahymena genome data. GBrowse allows a rich display of annotations to the genome sequence using a combination of text and icons, which is particularly useful for showing large-scale data in the context of the genome sequence. Proteomic analyses have identified the composition of different organelles and structures (6,14), and microarray expression analyses are anticipated by the ciliate research community. TIGR and TGD have combined to provide a basic set of annotations for each preliminary gene model. TGD determined the protein domain composition of each model using the InterProScan utility (15), and performed a BLASTP comparison of TIGR's preliminary gene models against the UniRef90 protein database (16). Gene models shown in GBrowse are labeled with the domains or top BLASTP hit determined by TGD, together with the gene product description provided by TIGR's automatic annotation. Expanded domain, homolog and gene product description data are shown on the preliminary model section of each gene page, plus computationally determined GO annotations based on its protein domain composition. A link to TIGR's analyses of the gene models is found on each of the gene pages. Information in TGD can be accessed using TGD's Quick Search utility, which allows users to query keywords in the following fields: gene names and aliases, gene descriptions, GO terms, predicted domains, homologs, paper abstracts, colleagues and authors of references in TGD. TGD provides full-text searching of Tetrahymena-related texts available in electronic format via Textpresso, a full-text literature search and information extraction tool available from GMOD (17). In addition to keyword searching, Textpresso allows users to search for the coincidence of keywords and terms from a number of defined categories, in a single sentence or article. Textpresso at TGD has a number of ciliate-related terms added to these categories (i.e. species names and nuclear terms) and searches papers that use the word ‘Tetrahymena’ in their keywords or abstract. Currently over 1100 full-text articles, 3200 abstracts and 5000 titles can be searched at TGD.

SUMMARY

TGD serves the Tetrahymena research community by linking the many years of published Tetrahymena research data to the recently completed macronuclear genome sequence. The resources we have developed are freely available at . TGD's close relationship with the Saccharomyces Genome Database allows it to quickly adopt tools and displays created by SGD to deliver a wide variety of yeast data. Please contact the TGD curators at ciliate-curator@genome.stanford.edu with any comments or suggestions.
  17 in total

1.  Genetic nomenclature rules for Tetrahymena thermophila.

Authors:  S L Allen
Journal:  Methods Cell Biol       Date:  2000       Impact factor: 1.441

2.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

3.  BLAT--the BLAST-like alignment tool.

Authors:  W James Kent
Journal:  Genome Res       Date:  2002-04       Impact factor: 9.043

4.  InterProScan--an integration platform for the signature-recognition methods in InterPro.

Authors:  E M Zdobnov; R Apweiler
Journal:  Bioinformatics       Date:  2001-09       Impact factor: 6.937

5.  UniProt: the Universal Protein knowledgebase.

Authors:  Rolf Apweiler; Amos Bairoch; Cathy H Wu; Winona C Barker; Brigitte Boeckmann; Serenella Ferro; Elisabeth Gasteiger; Hongzhan Huang; Rodrigo Lopez; Michele Magrane; Maria J Martin; Darren A Natale; Claire O'Donovan; Nicole Redaschi; Lai-Su L Yeh
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

6.  The generic genome browser: a building block for a model organism system database.

Authors:  Lincoln D Stein; Christopher Mungall; ShengQiang Shu; Michael Caudy; Marco Mangone; Allen Day; Elizabeth Nickerson; Jason E Stajich; Todd W Harris; Adrian Arva; Suzanna Lewis
Journal:  Genome Res       Date:  2002-10       Impact factor: 9.043

7.  Local alignment statistics.

Authors:  S F Altschul; W Gish
Journal:  Methods Enzymol       Date:  1996       Impact factor: 1.600

8.  Genomic and proteomic evidence for a second family of dense core granule cargo proteins in Tetrahymena thermophila.

Authors:  Grant R Bowman; Daryl G S Smith; K W Michael Siu; Ronald E Pearlman; Aaron P Turkewitz
Journal:  J Eukaryot Microbiol       Date:  2005 Jul-Aug       Impact factor: 3.346

Review 9.  Myosin genes in Tetrahymena.

Authors:  Selwyn A Williams; R H Gavin
Journal:  Cell Motil Cytoskeleton       Date:  2005-08

10.  Textpresso: an ontology-based information retrieval and extraction system for biological literature.

Authors:  Hans-Michael Müller; Eimear E Kenny; Paul W Sternberg
Journal:  PLoS Biol       Date:  2004-09-21       Impact factor: 8.029

View more
  51 in total

1.  Zygotic expression of the double-stranded RNA binding motif protein Drb2p is required for DNA elimination in the ciliate Tetrahymena thermophila.

Authors:  Jason A Motl; Douglas L Chalker
Journal:  Eukaryot Cell       Date:  2011-10-21

2.  Phosphoproteomic analysis of protein phosphorylation networks in Tetrahymena thermophila, a model single-celled organism.

Authors:  Miao Tian; Xiulan Chen; Qian Xiong; Jie Xiong; Chuanle Xiao; Feng Ge; Fuquan Yang; Wei Miao
Journal:  Mol Cell Proteomics       Date:  2013-11-07       Impact factor: 5.911

Review 3.  Tetrahymena as a Unicellular Model Eukaryote: Genetic and Genomic Tools.

Authors:  Marisa D Ruehle; Eduardo Orias; Chad G Pearson
Journal:  Genetics       Date:  2016-06       Impact factor: 4.562

4.  Distinct cyclin genes define each stage of ciliate conjugation.

Authors:  Nicholas A Stover; Jeffrey D Rice
Journal:  Cell Cycle       Date:  2011-05-15       Impact factor: 4.534

5.  The Tetrahymena thermophila phagosome proteome.

Authors:  Mary Ellen Jacobs; Leroi V DeSouza; Haresha Samaranayake; Ronald E Pearlman; K W Michael Siu; Lawrence A Klobutcher
Journal:  Eukaryot Cell       Date:  2006-09-29

6.  Retrotransposons and tandem repeat sequences in the nuclear genomes of cryptomonad algae.

Authors:  Hameed Khan; Catherine Kozera; Bruce A Curtis; Jillian Tarrant Bussey; Stan Theophilou; Sharen Bowman; John M Archibald
Journal:  J Mol Evol       Date:  2007-01-08       Impact factor: 2.395

7.  The Oxytricha trifallax macronuclear genome: a complex eukaryotic genome with 16,000 tiny chromosomes.

Authors:  Estienne C Swart; John R Bracht; Vincent Magrini; Patrick Minx; Xiao Chen; Yi Zhou; Jaspreet S Khurana; Aaron D Goldman; Mariusz Nowacki; Klaas Schotanus; Seolkyoung Jung; Robert S Fulton; Amy Ly; Sean McGrath; Kevin Haub; Jessica L Wiggins; Donna Storton; John C Matese; Lance Parsons; Wei-Jen Chang; Michael S Bowen; Nicholas A Stover; Thomas A Jones; Sean R Eddy; Glenn A Herrick; Thomas G Doak; Richard K Wilson; Elaine R Mardis; Laura F Landweber
Journal:  PLoS Biol       Date:  2013-01-29       Impact factor: 8.029

8.  Sfr13, a member of a large family of asymmetrically localized Sfi1-repeat proteins, is important for basal body separation and stability in Tetrahymena thermophila.

Authors:  Alexander J Stemm-Wolf; Janet B Meehl; Mark Winey
Journal:  J Cell Sci       Date:  2013-02-20       Impact factor: 5.285

9.  Genome-wide identification and evolution of ATP-binding cassette transporters in the ciliate Tetrahymena thermophila: A case of functional divergence in a multigene family.

Authors:  Jie Xiong; Lifang Feng; Dongxia Yuan; Chengjie Fu; Wei Miao
Journal:  BMC Evol Biol       Date:  2010-10-27       Impact factor: 3.260

10.  Refined annotation and assembly of the Tetrahymena thermophila genome sequence through EST analysis, comparative genomic hybridization, and targeted gap closure.

Authors:  Robert S Coyne; Mathangi Thiagarajan; Kristie M Jones; Jennifer R Wortman; Luke J Tallon; Brian J Haas; Donna M Cassidy-Hanley; Emily A Wiley; Joshua J Smith; Kathleen Collins; Suzanne R Lee; Mary T Couvillion; Yifan Liu; Jyoti Garg; Ronald E Pearlman; Eileen P Hamilton; Eduardo Orias; Jonathan A Eisen; Barbara A Methé
Journal:  BMC Genomics       Date:  2008-11-26       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.