Literature DB >> 18835847

TB database: an integrated platform for tuberculosis research.

T B K Reddy¹, Robert Riley, Farrell Wymore, Phillip Montgomery, Dave DeCaprio, Reinhard Engels, Marcel Gellesch, Jeremy Hubble, Dennis Jen, Heng Jin, Michael Koehrsen, Lisa Larson, Maria Mao, Michael Nitzberg, Peter Sisk, Christian Stolte, Brian Weiner, Jared White, Zachariah K Zachariah, Gavin Sherlock, James E Galagan, Catherine A Ball, Gary K Schoolnik.

Abstract

The effective control of tuberculosis (TB) has been thwarted by the need for prolonged, complex and potentially toxic drug regimens, by reliance on an inefficient vaccine and by the absence of biomarkers of clinical status. The promise of the genomics era for TB control is substantial, but has been hindered by the lack of a central repository that collects and integrates genomic and experimental data about this organism in a way that can be readily accessed and analyzed. The Tuberculosis Database (TBDB) is an integrated database providing access to TB genomic data and resources, relevant to the discovery and development of TB drugs, vaccines and biomarkers. The current release of TBDB houses genome sequence data and annotations for 28 different Mycobacterium tuberculosis strains and related bacteria. TBDB stores pre- and post-publication gene-expression data from M. tuberculosis and its close relatives. TBDB currently hosts data for nearly 1500 public tuberculosis microarrays and 260 arrays for Streptomyces. In addition, TBDB provides access to a suite of comparative genomics and microarray analysis software. By bringing together M. tuberculosis genome annotation and gene-expression data with a suite of analysis tools, TBDB (http://www.tbdb.org/) provides a unique discovery platform for TB research.

Entities: Chemical Disease Species

Mesh：

Year: 2008 PMID： 18835847 PMCID： PMC2686437 DOI： 10.1093/nar/gkn652

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

In humans, tuberculosis (TB) is caused by the bacterium Mycobacterium tuberculosis and primarily targets the lungs (as pulmonary TB), but can also affect other organs, including the brain and meninges, lymph nodes, bone and joints, the genitourinary system and the intestine and liver. TB is today the second highest cause of death from infectious diseases after HIV/AIDS (1) and is the biggest killer of people infected with HIV (2). The World Health Organization's most recent global data (from 2005) show that every year 8 million people become ill with tuberculosis and 2 million people die of the disease. A third of the world's population has been exposed to TB, making this disease one of the greatest global health challenges facing us today (3). A remarkable feature of TB is its ability to enter an asymptomatic latent phase lasting years or even decades. Activation of a latent infection can be precipitated by changes in the physiological and immune status of the host owing to declining cell-mediated immunity associated with senescence, malnutrition and diabetes or the occurrence of other diseases, especially HIV/AIDS (4). Chemotherapy for active TB due to drug-sensitive strains entails the use of multiple antibiotics administered for 6 months. This complicated and frequently toxic treatment regimen often results in poor patient compliance. This in turn has led to the emergence of antibiotic resistant strains that require longer treatment courses, the use of less effective and more toxic drugs and higher failure rates (5). As a result, TB remains a widespread and deadly disease whose control will require more effective public health measures and the development of new drugs and vaccines. Recent developments in genomics and the availability of the complete M. tuberculosis genome sequence (6) has led to the use of genome-wide expression profiling and comparative genomics methods to better understand M. tuberculosis pathology, latency, emerging drug resistance and evolution. However, despite the wide-spread use of functional and comparative genomics to study M. tuberculosis, there has been no single repository for these large-scale datasets, complete with high-quality experimental annotation, and connected to up-to-date gene annotation and comparative genomic information. Instead, much of these data have been located in disparate sites like GenoMycDB: a database for comparative analysis of mycobacterial genes and genomes (7) and MGDD: M. tuberculosis genome divergence database (8) that employ diverse and often incompatible formats and analytical tools. The Tuberculosis Database (TBDB) was developed to address this gap. TBDB uses software from the Stanford Microarray Database (SMD) (9) and the Broad Institute's Calhoun system (10,11), and houses gene-expression data paired with genome sequence and annotation data. Uniting experimental data with genome sequence data enables researchers to ask complex questions and draw inferences that would otherwise be impossible by looking at individual small datasets. In this context, TBDB brings together powerful genomics tools to advance M. tuberculosis research in ways that will contribute to the identification of new drug targets, vaccine antigens, diagnostics and host biomarkers.

TBDB OVERVIEW

TBDB is an integrated database that houses both annotated genome sequence data and microarray and RT–PCR expression data from in vitro experiments and TB-infected tissues. TBDB houses genome sequence data for several M. tuberculosis strains as well as data for numerous related species. These data and annotations include publicly available sequences from a number of sequencing centers and groups, including sequences being produced by the Broad Institute's Microbial Sequencing Center. The microarray data within TBDB are predominantly from M. tuberculosis, but we are in the process of incorporating in vivo data from infected host tissues (principally human, primate and murine) into TBDB. Experimental data may be deposited into TBDB by any TB researcher prior to publication providing prepublication access to tools for the analysis, annotation, visualization and sharing of data. The data are then made public at the author's request or following publication, whichever is first. In addition, TBDB curators search the literature for publications containing relevant TB or host microarray data. The primary data are then requested from the authors of such publications and are entered into TBDB, where the experiments are annotated and made public so other researchers can reanalyze the data (often in conjunction with other datasets within TBDB) using TBDB tools. Table 1 lists TBDB statistics, including the number of annotated genomes in TBDB, microarray experiments, publications and other data types.

Table 1.

Summary of TBDB data content (as of September 2008)

TBDB data statistics
Number of genomes	28
Number of all microarrays	∼5500
Number of public microarrays	∼1800
Number of publications	27
Number of experiment sets	160

Summary of TBDB data content (as of September 2008) The first route of entry into TBDB is the Quick Search feature, which allows a user to search all objects in TBDB by gene name, gene sequence name, author name, title or any other keyword. The result page of a Quick Search provides a count of genes, microarray experiments, operons, gene families and other database objects that match the query. Links from this results page provide direct access to pages with detailed information about particular objects, such as the Gene Detail and Publication pages. Quick Search is available at the top of every TBDB page, and thus provides an easily accessible single integrated access point to all genome annotation and expression data in TBDB.

TBDB GENOMES

TBDB currently houses genome sequence data for M. tuberculosis strain H37Rv (a standard prototype strain long used for experimental and animal infection studies), as well as other M. tuberculosis strains and bacteria from related taxa, focusing on members of the Actinomycetes family of high G+C content, Gram-positive organisms of which M. tuberculosis is a member. These genomes sequences have been annotated with a variety of genomic features including genes, operons, sequence similarity to GenBank sequences using BLAST (12), transfer RNAs using tRNAScan (13), protein domains and families using PFAM (14) and noncoding RNAs based on RFAM (15). Known immune epitopes have also been mapped through collaboration with BioHealthBase (16). A suite of analytical tools is also provided to allow comparative genomic analysis of M. tuberculosis. Table 2 lists the genomes in TBDB for which sequence data are available along with their size and the number of annotated genes. Access to the annotated genome sequences and comparative data is provided through several search interfaces, some of which are described subsequently.

Table 2.

List of annotated genomes in TBDB

Organism	Size (mb)	Genes
M. tuberculosis H37Rv	4.41	3999
M. tuberculosis CDC1551	4.4	4189
M. tb. F11 (finished)	4.42	3959
M. tb. C	4.38	3851
M. tb. Haarlem	4.4	3866
M. bovis AF2122/97	4.35	3920
M. bovis BCG	4.37	3952
M. leprae TN	3.27	1605
M. avium 104	5.48	5120
M. avium k10	4.83	4350
M. smegmatis MC2 155	6.99	6716
M. marinum	6.64	5423
M. ulcerans Agy99	5.63	4160
M. vanbaalenii PYR-1	6.49	5979
M. sp. KMS	6.26	5975
M. sp. MCS	5.71	5391
Rhodococcus sp. RHA1	9.7	9145
Nocardia farcinica IFM 10152	6.02	5683
Corynebacterium glutamicum ATCC 13032	3.28	3057
C. diphtheriae NCTC 13129	2.49	2272
C. efficiens YS-314	3.15	2950
C. jeikeium K411	2.48	2120
Streptomyces avermitilis MA-4680	9.12	7673
S. coelicolor A3(2)	8.67	7825
Propionibacterium acnes KPA171202	2.56	2297
Acidothermus cellulolyticus 11B	2.44	2157
Bifidobacterium longum NCC2705	2.26	1727
Rhodobacter sphaeroides	4.6	4242

List of annotated genomes in TBDB

Feature detail pages

All information about annotated features on any genome sequence is available through Feature Detail pages, of which the Gene Detail page is the most common example (Figure 1). Information presented in the Gene Detail page is organized into different sections. These include, Gene Info, Gene Expression, Functional Annotation, Transcript Info, Sequence and genome display options. The Gene Info section provides complete details about Locus Name, Gene Symbol, Synonyms, Gene Name, Gene Product Names, Gene Family, Location, Protein Domains, External Links to related databases including TubercuList (17), TB Structural Genomics Consortium (TBSGC) Protein Structure Information (18) and the Proteome 2D-PAGE Database. Figure 1 shows the gene detail page for dosR (devR, Rv3133c), which encodes the response regulator of a two-component signal transduction system that tightly controls a well-studied M. tuberculosis regulon that is activated by oxygen limitation or exposure to nitric oxide (19).

Figure 1.

TBDB Gene Detail page. The Gene Detail page provides at-a-glance information for a given gene, including known names and synonyms, predicted function(s) and protein domains. It also serves as a jumping off point to various sequence tools, and to expression data for that gene. In addition, it provides several links to external resources such as TubercuList, TBSGC Protein Structure Information, Proteome 2D-PAGE Database at Max Planck Institute.

Genome visualization and comparative analysis

Researchers can retrieve DNA or protein sequence for segments of any of the genome sequences in TBDB from many locations within the site, including the Browse Regions search tool. The sequences can then be visualized using a number of different tools. The Argo Genome Browser (an interactive applet) and the Feature Map (a lighter weight version of the Argo Genome Browser) provide linear views of genome sequences along with all associated annotated features. Argo in particular provides a dynamic interface to visualizing genome data that allows users to zoom from whole chromosomes to individual nucleotides, navigate within sequences, and select individual features to retrieve additional information. A Circular Genome Viewer provides a circular plot of genome sequences along with a plot of the density of particular features, GC content and GC skew. Finally, the Genome Map tool provides a dynamic linear view of one or more genome sequences and associated annotations, and displays conserved synteny between the displayed genomes for regions selected by the user (Figure 2).

Figure 2.

Genome Map tool. This tool provides a linear view of one or more genome sequences and associated annotations as well as conserved synteny between genomes. Annotations are provided as tracks above (forward strand) and below (reverse strand) the midline. When zoomed out, annotations are viewed as density plots; when zoomed in individual features are displayed. Users may select regions of a genome sequence by dragging along the midline. Syntenic regions in the other sequences associated with the selection are then displayed as red bands. An additional number of tools are also provided specifically for comparative analyses between genome sequences, including the Synteny Map, Dot Plot, Operon Browser (Figure 3) and Gene Family Search. The Synteny Map uses precomputed genome alignments to graphically display regions of genomic similarity between a single reference genome and one or more other genomes—in effect providing the results of an in silico genome hybridization between sets of genomes. Using this tool, the user can select regions of interest and then click a region to zoom in and view genes, genome sequence, and features. The Dot Plot displays a navigable map of computed synteny between genomes in the form of dot-plot lines. When comparing multiple genomes, the color of the plotted synteny indicates which genome is aligned to the reference at that position. The Operon Browser is a tool that simultaneously displays the expression correlation between genes in a genomic region of the M. tuberculosis H37Rv strain while showing syntenic gene order of orthologs in related species. A heatmap derived from expression correlation data is provided along with an alignment of syntenic areas. Mousing over the genes provides additional information such as locus ID, gene symbol and description. Color coding of genes indicate orthologous relationships across different species. Finally, the Gene Family Search displays phylogenetic trees and sequence alignments of predicted orthologous gene families within the genome sequences in TBDB. The basic search feature lets the user choose the number of genomes to query and whether to limit the search to strict orthologs or not. In addition, an advanced search option chooses which genomes to include or exclude.

Figure 3.

Comparative genome analysis. The Genomes Synteny Map (A), Dot Plot (B) and Operon Map Browser (C) provide different ways to access comparative genomic data between M. tuberculosis reference genome and selected related species. These tools provide an interactive means to explore comparative genomic data.

TBDB GENE EXPRESSION DATA

TBDB houses public and prepublication microarray and RT–PCR expression data. Public data are freely accessible and can be downloaded or reanalyzed using TBDB analysis tools. Access to prepublication data is restricted to the researchers who generated the data until they publish or decide to make their data public. TBDB users can establish a free user account to enter microarray data, share prepublication microarray or RT–PCR data with colleagues or store datasets for analysis in a data repository. Data in the repository can be shared with other researchers at the discretion of the TBDB user. Expression data in TBDB can be accessed by searching for data from individual microarrays or RT–PCR assays or by searching for data from a publication. For a novice user, the publication search is an easy place to start exploring expression data in TBDB. The expression Basic Search is an interactive search option that queries TBDB via publication, organism or dataset. The expression Advanced Search finds microarray data by experimenter, category, subcategory and organism. The Gene Search for Expression searches for genes or reporter sequences used on microarrays. Reporter sequences correspond to a piece of DNA deposited on a microarray slide. This search returns all microarray spots associated with a reporter sequence or gene, and the search results link to the Spot History page that lets users explore all associated microarray data.

Expression connection

Using Expression Connection, researchers can visualize and explore clustered microarray datasets from publications whose data are present within TBDB. Clustering organizes expression data for genes or reporter sequences into groups that have similar expression profiles. This enables a user to directly view and explore already clustered data within TBDB without needing to go through the data analysis pipeline. As shown in Figure 4, a publication detail page can be accessed by following TB Expression → Gene Expression Publications → ‘Data in TBDB’. Interactive clustered data images for a publication can be navigated using GeneXplorer (20), which provides views of the most correlated genes for a gene of interest or searches for genes using text queries (Figure 4). Thus, this option enables a user to explore and interrogate TBDB for expression data from publications.

Figure 4.

Publication microarray data and expression connection. Researchers can access the full raw microarray data associated with a publication, either for download, or retrieval through the data retrieval and analysis pipeline. In addition, users can explore clustered microarrays data, whereby they can search for particular genes, or identify which genes show coexpression across a particular dataset.

Data analysis

TBDB provides a suite of microarray data analysis tools for its users. All tools are freely available to analyze both public and prepublication data in TBDB. A typical data analysis process at TBDB involves several steps in the following order: Experiment Selection → Gene Selection and Annotation → Data Filtering Options → Data Retrieval → Gene Filtering → Clustering and Image Generation. At each step, a user is presented with various options that allow them to filter and cluster the data according to their needs. For example, a user can employ either the Basic or Advanced Expression Search to choose a set of microarray data for further analysis. Clicking on the ‘Data Retrieval and Analysis’ option invokes the data analysis pipeline, where a user can select various microarray data filtering and transformation options. Many microarray data analysis tools can be applied to datasets, including hierarchical clustering, imputation of missing values, Gene Set Enrichment Analysis (21), Singular Value Decomposition (22) and pathway analyses. All SMD analysis tools [many described previously (9)] have been made available through TBDB. At each step in the data analysis, pipeline a link to a relevant ‘Help’ page is provided, which explains in detail the various available options. In addition, the TBDB data repository provides access to the suite of gene-expression analysis tools provided through the Gene Pattern software (23).

Literature curation

Curating microarray expression data from publications is an important part of TBDB's efforts. We actively search PubMed for relevant publications containing microarray experiments, then obtain the raw data from researchers and load them into TBDB, with detailed experimental annotations.

FUTURE DIRECTIONS

We are working to increase the quality and quantity of data within TBDB and to incorporate additional data types. One of our priorities is to acquire host expression data from M. tuberculosis-infected tissues (mouse, primate and human), and we also plan to expand TBDB's capacity to house and analyze RT–PCR data and will develop tools for comparative analysis of RT–PCR and microarray expression data. We will also implement tools such as GO::TermFinder (24), which allows users to determine whether there are biological themes associated with a list of genes of interest, and tools for the analysis of replicate microarray experiments. We are also working to improve the depth and quality of our genome annotations. We are currently curating TB literature and associating these data with genes and other genomics features. Moreover, we have implemented and will deploy a community annotation infrastructure to allow TB researchers to submit additions and improvements to existing annotations through the TBDB website. We are also using the comparative sequence integrated into TBDB to improve on the accuracy of structural gene annotations and to predict additional potential noncoding genes. Finally, as new TB sequences are produced by the Broad Microbial Sequencing Center, they will be deposited and made publicly available in TBDB. Ultimately, we hope that TBDB will serve as a community hub for TB research; a TB research community information page will be implemented with a listing of TB research labs and colleagues; this will also provide a forum for the community of users including feedback and suggestions from the community that will help us better serve them.

CONCLUSION

TBDB contains annotated genome and expression (microarray and RT–PCR) data and a suite of data analysis tools designed to serve as a unique resource for TB research and for the discovery of new drugs, vaccines and biomarkers. Data within the TBDB and all analysis tools are freely available to researchers. Only prepublication gene-expression data require a password.

FUNDING

The Bill and Melinda Gates Foundation. Funding for open access charge: The Bill and Melinda Gates Foundation. Conflict of interest statement. None declared.

24 in total

1. GenePattern 2.0.

Authors: Michael Reich; Ted Liefeld; Joshua Gould; Jim Lerner; Pablo Tamayo; Jill P Mesirov
Journal: Nat Genet Date: 2006-05 Impact factor: 38.330

2. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.

Authors: T M Lowe; S R Eddy
Journal: Nucleic Acids Res Date: 1997-03-01 Impact factor: 16.971

3. Regulation of the Mycobacterium tuberculosis hypoxic response gene encoding alpha -crystallin.

Authors: D R Sherman; M Voskuil; D Schnappinger; R Liao; M I Harrell; G K Schoolnik
Journal: Proc Natl Acad Sci U S A Date: 2001-06-19 Impact factor: 11.205

4. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors: Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal: Proc Natl Acad Sci U S A Date: 2005-09-30 Impact factor: 11.205

Review 5. Learning from the genome sequence of Mycobacterium tuberculosis H37Rv.

Authors: S T Cole
Journal: FEBS Lett Date: 1999-06-04 Impact factor: 4.124

6. Tuberculosis Infection: Insight from Immunogenomics.

Authors: Matthew Arentz; Thomas R Hawn
Journal: Drug Discov Today Dis Mech Date: 2007

7. Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence.

Authors: S T Cole; R Brosch; J Parkhill; T Garnier; C Churcher; D Harris; S V Gordon; K Eiglmeier; S Gas; C E Barry; F Tekaia; K Badcock; D Basham; D Brown; T Chillingworth; R Connor; R Davies; K Devlin; T Feltwell; S Gentles; N Hamlin; S Holroyd; T Hornsby; K Jagels; A Krogh; J McLean; S Moule; L Murphy; K Oliver; J Osborne; M A Quail; M A Rajandream; J Rogers; S Rutter; K Seeger; J Skelton; R Squares; S Squares; J E Sulston; K Taylor; S Whitehead; B G Barrell
Journal: Nature Date: 1998-06-11 Impact factor: 49.962

8. Rfam: annotating non-coding RNAs in complete genomes.

Authors: Sam Griffiths-Jones; Simon Moxon; Mhairi Marshall; Ajay Khanna; Sean R Eddy; Alex Bateman
Journal: Nucleic Acids Res Date: 2005-01-01 Impact factor: 16.971

9. GeneXplorer: an interactive web application for microarray data visualization and analysis.

Authors: Christian A Rees; Janos Demeter; John C Matese; David Botstein; Gavin Sherlock
Journal: BMC Bioinformatics Date: 2004-10-01 Impact factor: 3.169

10. MGDD: Mycobacterium tuberculosis genome divergence database.

Authors: Anchal Vishnoi; Alok Srivastava; Rahul Roy; Alok Bhattacharya
Journal: BMC Genomics Date: 2008-08-05 Impact factor: 3.969

114 in total

1. Systems biology approaches to understanding mycobacterial survival mechanisms.

Authors: Helena I M Boshoff; Desmond S Lun
Journal: Drug Discov Today Dis Mech Date: 2010

2. Systematic survey of clonal complexity in tuberculosis at a populational level and detailed characterization of the isolates involved.

Authors: Yurena Navarro; Marta Herranz; Laura Pérez-Lago; Miguel Martínez Lirola; Maria Jesús Ruiz-Serrano; Emilio Bouza; Darío García de Viedma
Journal: J Clin Microbiol Date: 2011-09-28 Impact factor: 5.948

3. Crystal structure of the Mycobacterium tuberculosis transcriptional regulator Rv0302.

Authors: Tsung-Han Chou; Jared A Delmar; Catherine C Wright; Nitin Kumar; Abhijith Radhakrishnan; Julia K Doh; Meredith H Licon; Jani Reddy Bolla; Hsiang-Ting Lei; Kanagalaghatta R Rajashankar; Chih-Chia Su; Georgiana E Purdy; Edward W Yu
Journal: Protein Sci Date: 2015-09-29 Impact factor: 6.725

Review 4. Genomic basis for natural product biosynthetic diversity in the actinomycetes.

Authors: Markus Nett; Haruo Ikeda; Bradley S Moore
Journal: Nat Prod Rep Date: 2009-09-01 Impact factor: 13.423

5. COMPASS identifies T-cell subsets correlated with clinical outcomes.

Authors: Lin Lin; Greg Finak; Kevin Ushey; Chetan Seshadri; Thomas R Hawn; Nicole Frahm; Thomas J Scriba; Hassan Mahomed; Willem Hanekom; Pierre-Alexandre Bart; Giuseppe Pantaleo; Georgia D Tomaras; Supachai Rerks-Ngarm; Jaranit Kaewkungwal; Sorachai Nitayaphan; Punnee Pitisuttithum; Nelson L Michael; Jerome H Kim; Merlin L Robb; Robert J O'Connell; Nicos Karasavvas; Peter Gilbert; Stephen C De Rosa; M Juliana McElrath; Raphael Gottardo
Journal: Nat Biotechnol Date: 2015-05-25 Impact factor: 54.908

6. MprAB regulates the espA operon in Mycobacterium tuberculosis and modulates ESX-1 function and host cytokine response.

Authors: Xiuhua Pang; Buka Samten; Guangxiang Cao; Xisheng Wang; Amy R Tvinnereim; Xiu-Lan Chen; Susan T Howard
Journal: J Bacteriol Date: 2012-10-26 Impact factor: 3.490

7. Origins of a 350-kilobase genomic duplication in Mycobacterium tuberculosis and its impact on virulence.

Authors: Pilar Domenech; Anya Rog; Jalal-ud-din Moolji; Nicolas Radomski; Ashley Fallow; Lizbel Leon-Solis; Julia Bowes; Marcel A Behr; Michael B Reed
Journal: Infect Immun Date: 2014-04-28 Impact factor: 3.441

8. Antimicrobial efflux pumps and Mycobacterium tuberculosis drug tolerance: evolutionary considerations.

Authors: John D Szumowski; Kristin N Adams; Paul H Edelstein; Lalita Ramakrishnan
Journal: Curr Top Microbiol Immunol Date: 2013 Impact factor: 4.291

9. From Corynebacterium glutamicum to Mycobacterium tuberculosis--towards transfers of gene regulatory networks and integrated data analyses with MycoRegNet.

Authors: Justina Krawczyk; Thomas A Kohl; Alexander Goesmann; Jörn Kalinowski; Jan Baumbach
Journal: Nucleic Acids Res Date: 2009-06-03 Impact factor: 16.971

10. Functional genetic diversity among Mycobacterium tuberculosis complex clinical isolates: delineation of conserved core and lineage-specific transcriptomes during intracellular survival.

Authors: Susanne Homolka; Stefan Niemann; David G Russell; Kyle H Rohde
Journal: PLoS Pathog Date: 2010-07-08 Impact factor: 6.823