Literature DB >> 17982173

IDBD: infectious disease biomarker database.

In Seok Yang¹, Chunsun Ryu, Ki Joon Cho, Jin Kwang Kim, Swee Hoe Ong, Wayne P Mitchell, Bong Su Kim, Hee-Bok Oh, Kyung Hyun Kim.

Abstract

Biomarkers enable early diagnosis, guide molecularly targeted therapy and monitor the activity and therapeutic responses across a variety of diseases. Despite intensified interest and research, however, the overall rate of development of novel biomarkers has been falling. Moreover, no solution is yet available that efficiently retrieves and processes biomarker information pertaining to infectious diseases. Infectious Disease Biomarker Database (IDBD) is one of the first efforts to build an easily accessible and comprehensive literature-derived database covering known infectious disease biomarkers. IDBD is a community annotation database, utilizing collaborative Web 2.0 features, providing a convenient user interface to input and revise data online. It allows users to link infectious diseases or pathogens to protein, gene or carbohydrate biomarkers through the use of search tools. It supports various types of data searches and application tools to analyze sequence and structure features of potential and validated biomarkers. Currently, IDBD integrates 611 biomarkers for 66 infectious diseases and 70 pathogens. It is publicly accessible at http://biomarker.cdc.go.kr and http://biomarker.korea.ac.kr.

Entities: Chemical Disease Species

Mesh：

Substances：

Year: 2007 PMID： 17982173 PMCID： PMC2238845 DOI： 10.1093/nar/gkm925

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Infectious diseases remain among the leading causes of death and disability worldwide. About 15 million (>25%) of 57 million annual deaths are estimated to be related directly to infectious diseases (1). Newly emerging and re-emerging infectious diseases constitute an urgent and ongoing threat to public health throughout the world. The discovery of acquired immune deficiency syndrome (AIDS) has led to renewed appreciation of the consequences of the emergence of infectious diseases. Severe acute respiratory syndrome (SARS) emerged in southern China in 2002 and has had a profound impact on public health (2). Influenza viruses possess evolutionary agility and the capacity to jump between fowl, farm animal and human species (3). Just as troubling are chronic infections, which create persistent social and economic havoc. Recent studies have shown that the burden of morbidity and mortality associated with certain infectious diseases falls primarily on infants and young children (4), with long-term social and economic consequences. Surveillance and early response to infectious diseases depend on rapid clinical diagnosis and detection, which, if in place, are able to ameliorate suffering and economic loss. Biomarkers, molecules that can be sensitively measured in the human body, are by definition potentially diagnostic. The efficacy of biomarkers to infectious diseases lies in their capability to provide early detection, establish highly specific diagnosis, determine accurate prognosis, direct molecular-based therapy and monitor disease progression (5). They are increasingly important in both therapeutic and diagnostic processes, with high potential to guide preventive interventions. Vast resources have been devoted to identifying and developing biomarkers that can help determine the treatments for patients. Furthermore, there is growing consensus that multiple markers will be required for most diagnoses, while single markers may serve in only selected cases. Despite intensified interest and research, however, the rate of development of novel biomarkers has been falling (6), suggesting that a resource that leverages existing data is overdue. At present the databases containing information about biomarkers are focused predominantly on cancer: early detection research network (7), gastric cancer knowledgebase (8), integrated cancer biomarker information system (9) and database for cancer, asthma and autism for children's study (10). Even here, although 15–20% of cancers are linked to infectious diseases and chronic infection causes cancer (11), no systematic effort has been described for integrating information from the cancer biomarker and the infectious disease domains. In order to advance our understanding of biomarkers and the roles in early infection processes, we have developed an integrated user-friendly relational database that catalogs putative and validated biomarkers relates them to infectious diseases processes. In addition, we have added value by hosting various bioinformatics tools that can be used to analyze and visualize the biomarker data. This freely accessible resource will be a valuable research tool and a contribution to improved public heath.

OVERVIEW OF THE DATABASE

Infectious Disease Biomarker Database (IDBD) introduces a community annotation database of biomarkers, with interfaces for users to directly edit their content and to keep track of editing history, thus capturing community knowledge and expertise. It was designed to collect, store and display information about biomarkers, conjoined to research tools for sequence and structural analyses of the data. IDBD currently includes information on 611 biomarkers from 66 infectious diseases and 70 pathogens (Table 1). Biomarkers were classified according to detection, diagnosis, pathogen typing and virulence factor for clinical or epidemiological studies and application. Validated biomarkers were regarded as representative markers for experimental verification such as detection and diagnosis of infectious diseases in the reference and specialized laboratories or in scientific literatures. Potential biomarkers were defined as those frequently cited in the context of detection and diagnosis of infectious disease in recently published research journals. The correct assignment of biomarker subtypes and the evaluation of potential or validated biomarkers critically depend on expert group in infectious disease research fields. The IDBD data are updated and modified on a regular basis by a curation team, composed of researchers at the Center for Infectious Diseases and the Center for Immunology and Pathology in the Korea National Institute of Health (KNIH) in Seoul. The content in IDBD is open and freely accessible to the general public, and IDBD is a part of the National Disease Biomarker Bank project, an integrated framework for identifying, collecting, distributing and managing of biomarkers, which is being developed at KNIH.

Table 1.

Current number of biomarker entries in IDBD

Disease groups	Disease	Pathogen	Biomarker
Gastrointestinal infection	14	14	107
Respiratory infection	16	18	154
Neurological infection	2	1	14
Urogenital infection^a	9	10	46
Viral hepatitis^a	5	5	10
Hemorrhagic fever	4	4	37
Zoonosis	7	6	87
Arbovirus infection	5	5	110
Antibiotics resistance^a	6	6	83
Bioterrorism	8	8	83
Total^b	66	70	611

aDiseases, pathogens and biomarkers without overlapping with other groups.

bTotal number of diseases, pathogens and biomarkers without overlapping.

Current number of biomarker entries in IDBD aDiseases, pathogens and biomarkers without overlapping with other groups. bTotal number of diseases, pathogens and biomarkers without overlapping.

DATABASE DESIGN AND CONTENTS

IDBD primarily consists of the three main tables: disease, pathogen and biomarker arranged in one Oracle schema (12). Infectious diseases are divided into 10 subgroups according to the infection site and the unique features of the pathogen or disease: gastrointestinal infection, respiratory infection, neurological infection, urogenital infection, viral hepatitis, hemorrhagic fever, zoonosis, arbovirus infection, antibiotics resistance and bioterrorism. In total, 10 disease subgroups contain 66 diseases (Table 1). Each disease at IDBD is characterized by a number of attributes such as general information, pathogen, infection, symptom, diagnosis, treatment and prevention. Pathogens are grouped into bacteria, virus, fungi and parasite, currently comprising 70 pathogens of mostly bacteria and virus. Each pathogen is characterized by general information, disease, biomarker list and related infection. Validated and potential biomarker entries are divided into three categories of detection/diagnosis, pathogen typing and virulence factor, and users can then access the data according to these criteria (Figure 1A).

Figure 1.

A screenshot of IDBD showing (A) the list of biomarkers at the page of pathogen, classified according to detection/diagnosis, pathogen typing and virulence factor, and the biomarker subtype page, (B) the list of biomarkers in alphabetical order, with two search options: Simple Search and Complex Search, (C) Complex Search page, (D) search results and (E) retrieved sequences viewed in a separate window. The database contains approximately 8–9 biomarkers per pathogen (Table 1), comprising proteins, nucleic acids, carbohydrates and small molecules. Each biomarker contains a number of categories of information: general information, detection, mechanism, pathogen link, sequence information, NCBI link, secondary structure, tertiary structure, PDB link and reference. Biological functions or roles of biomarkers are also included, if available. Sequences are obtained from databases of protein and nucleic acid sequences of the National Center for Biotechnology Information (NCBI) (13). Information of secondary and tertiary structures is obtained from PDBsum at the European Bioinformatics Institute (EBI) (14) and Protein Data Bank (PDB) (15), respectively.

DATA RETRIEVAL

Biomarker data can be retrieved efficiently through establishment of entry portals for search functions. Users can access biomarker records from the front page by clicking Biomarker at the top menu, which then allows browsing of biomarkers in alphabetical order (Figure 1B). Two different search options are provided: Simple Search can query the complete database by selecting one or all three groups of biomarkers (protein, nucleic acid and carbohydrate). Only one small molecule, a catechol siderophore, is currently deposited. Specific database queries can be defined using the Complex Search feature (Figure 1C), where fields of interest (class of pathogen, molecular type, pathogen name, biomarker name and NCBI accession number) can be selected from pull-down menus. Users can iterate and append the fields in the pull-down menus by clicking the Add or Delete button and in turn combining by AND or OR operators. Both Simple and Complex Search returns a result list of distinct biomarkers that matches the search criteria (Figure 1D). The listed entries of biomarkers are linked to detailed information (general information, detection, mechanism, etc.), which can be viewed by clicking the name of each biomarker. The sequence information in the displayed list can be retrieved in a separate window (Figure 1E), as amino acid sequence in single letter abbreviation (protein) or nucleotide sequence in FASTA format (nucleic acids) of the selected biomarkers. The complete contents of IDBD as well as other Web resources can be searched via its user-friendly interface. Three different search options are provided: Internal, External and News Search (Figure 2A). Internal Search can query via either title or content search which extracts textual content stored in the IDBD database including disease, pathogens and biomarkers (Figure 2B). In the case of External Search, users can submit a single query to the NCBI PubMed, NCBI Entrez sequence database, PDB structure database and Centers for Disease Control and Prevention (CDC) (Figure 2C). Details of published articles, sequence and structural data or disease and pathogen information can be viewed in abstract terms and sorted with regard to the information items. More detailed information can be retrieved by clicking the posting title of sorted search data. News Search provides accurate and timely information on the outbreaks and sporadic cases of any infectious diseases. Users can search current news or archives by entering keywords from scientific resources including BBC News, ScienceNow from Science magazine and Yahoo! News. More detailed information can be displayed by clicking on the title of news articles sorted by date.

Figure 2.

Search examples showing (A) three search options: Internal, External and News Search, (B) Internal Search results and (C) External Search results.

DATA ANALYSIS

On the biomarker list page, IDBD allows access to sequence and structure analysis tools by clicking ‘Data analysis new window’ on the left-hand navigation bar. The user can select sequences of interest obtained from biomarker search (Figure 1E), prepare an input data by clicking ‘Sequence alignment’, and conduct multiple sequence alignment by direct submission or upload a file of the chosen sequences to CLUSTALW tool of EBI (16). On the result page (Figure 3A), the alignment can then be exported for phylogenetic tree construction. The sequence analysis also includes standard BLAST services (blastn and blastp) on external network connectivity (17), pairwise distance and synonymous—non-synonymous ratio analysis. In the case of structure analysis, users may submit an amino acid sequence or multiple sequences aligned to a set of methods for secondary (PSA) and tertiary structure (Geno3d) prediction (18,19). The predicted tertiary structures can be modeled by using Jmol either with a PDB file or PDB ID, if available (Figure 3B).

Figure 3.

The sequence and structure analysis tools showing (A) analysis results using selected sequences of biomarker data: sequence alignment and phylogenetic tree, and (B) predicted 3D structure of a biomarker viewed with Jmol.

FUTURE DIRECTIONS

Pathogen-specific biomarkers can provide information necessary for diagnosis, detection and treatment of various infectious diseases. Recent concerns about bioterrorism and emerging infectious diseases have led to a new focus on the development of biomarkers, and molecular diagnostics are growing fast in infectious diseases. Currently, there are 611 biomarkers in IDBD among which 239 are validated ones. The content of biomarkers in IDBD is expanding rapidly, and our goal is to collect a complete dataset of validated or potential biomarkers used in the detection of infectious diseases and to generate a knowledgebase that would be a valuable tool for users interested in the discovery of infectious disease biomarkers. The main challenge in the future is to keep IDBD up to date with the growing number of biomarkers experimentally verified and published in peer reviewed journals. We will thus implement text-mining support for database curation in the near future. Toward this goal, a network of biomarker expert groups at the Center for Infectious Diseases and the Center for Immunology and Pathology at KNIH and advisory committee outside KNIH has been formed to coordinate database management and validation of novel biomarkers. Another challenge is to provide IDBD with regional epidemiological trends of some infectious diseases. Epidemiological surveillance is key to control and monitor diseases such as cholera, shigellosis, typhoid fever, paratyphoid fever, rabies, etc. Since information on infectious agents is essential in preventing and controlling the spread of infectious diseases, it is necessary to collect, analyze and publish the relevant information on a regional scale, to the benefit of the researchers studying infectious diseases as well as the public. New insights into innate immunity initiated by host–pathogen interaction are changing the way we think about pathogenesis of infectious diseases. Different approaches are employed for the characterization of immune responses by evaluating the epitopes recognized by antigen-specific receptors of immune systems. We intend to construct additional tools useful for molecular immunological and etiological applications. IDBD will collect intrinsic epitope features, seasonal patterns of sequences and epitope responses to T and B cells, and develop software for epitope analysis and prediction.

USER MANAGEMENT

The IDBD management system allows users to access the biomarker database without registration. However, user registration is required for adding and editing database contents, and user support can be obtained by e-mailing graduate@korea.ac.kr. Readers are encouraged to contact us if they wish to provide new data for inclusion in IDBD, assist with curation or have any suggestions for improvements.

IMPLEMENTATION

IDBD was developed as a relational database using Oracle 10g applications (12) on the Windows operating system. Two open source softwares, the Apache HTTP Server and Apache Tomcat, were used as HTTP server and servlet container for Web service, respectively. Perl scripts were used to provide common gateway interface for sequence alignment using ClustalW, and Java applet was used to link Jmol for displaying 3D models. IDBD can be publicly accessed from any Web browser at http://biomarker.cdc.go.kr and http://biomarker.korea.ac.kr.

18 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Geno3D: automatic comparative molecular modelling of protein.

Authors: Christophe Combet; Martin Jambon; Gilbert Deléage; Christophe Geourjon
Journal: Bioinformatics Date: 2002-01 Impact factor: 6.937

3. In biomarkers we trust?

Authors: Monya Baker
Journal: Nat Biotechnol Date: 2005-03 Impact factor: 54.908

Review 4. Protein biomarker discovery and validation: the long and uncertain path to clinical utility.

Authors: Nader Rifai; Michael A Gillette; Steven A Carr
Journal: Nat Biotechnol Date: 2006-08 Impact factor: 54.908

Review 5. Trends in biomarker research for cancer detection.

Authors: P R Srinivas; B S Kramer; S Srivastava
Journal: Lancet Oncol Date: 2001-11 Impact factor: 41.316

6. Bats are natural reservoirs of SARS-like coronaviruses.

Authors: Wendong Li; Zhengli Shi; Meng Yu; Wuze Ren; Craig Smith; Jonathan H Epstein; Hanzhong Wang; Gary Crameri; Zhihong Hu; Huajun Zhang; Jianhong Zhang; Jennifer McEachern; Hume Field; Peter Daszak; Bryan T Eaton; Shuyi Zhang; Lin-Fa Wang
Journal: Science Date: 2005-09-29 Impact factor: 47.728

7. Entrez Gene: gene-centered information at NCBI.

Authors: Donna Maglott; Jim Ostell; Kim D Pruitt; Tatiana Tatusova
Journal: Nucleic Acids Res Date: 2006-12-05 Impact factor: 16.971

8. PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids.

Authors: Roman A Laskowski; Victor V Chistyakov; Janet M Thornton
Journal: Nucleic Acids Res Date: 2005-01-01 Impact factor: 16.971

9. Gastric Cancer (Biomarkers) Knowledgebase (GCBKB): A Curated and Fully Integrated Knowledgebase of Putative Biomarkers Related to Gastric Cancer.

Authors: Bernett T K Lee; Chun Meng Song; Boon Huat Yeo; Cheuk Wang Chung; Ying Leong Chan; Teng Ting Lim; Yen Bing Chua; Marie C S Loh; Boon Keong Ang; Praveen Vijayakumar; Lailing Liew; Jiahao Lim; Yun Ping Lim; Chee Hong Wong; Danny Chuon; Gunaretnam Rajagopal; Jeffrey Hill
Journal: Biomark Insights Date: 2007-02-07

Review 10. The challenge of emerging and re-emerging infectious diseases.

Authors: David M Morens; Gregory K Folkers; Anthony S Fauci
Journal: Nature Date: 2004-07-08 Impact factor: 49.962

10 in total

Review 1. A novel knowledge representation framework for the statistical validation of quantitative imaging biomarkers.

Authors: Andrew J Buckler; David Paik; Matt Ouellette; Jovanna Danagoulian; Gary Wernsing; Baris E Suzek
Journal: J Digit Imaging Date: 2013-08 Impact factor: 4.056

2. LBD: a manually curated database of experimentally validated lymphoma biomarkers.

Authors: Bin Tan; Saige Xin; Yanshi Hu; Cong Feng; Ming Chen
Journal: Database (Oxford) Date: 2022-07-05 Impact factor: 4.462

3. VnD: a structure-centric database of disease-related SNPs and drugs.

Authors: Jin Ok Yang; Sangho Oh; Gunhwan Ko; Seong-Jin Park; Woo-Yeon Kim; Byungwook Lee; Sanghyuk Lee
Journal: Nucleic Acids Res Date: 2010-11-04 Impact factor: 16.971

4. CBD: a biomarker database for colorectal cancer.

Authors: Xueli Zhang; Xiao-Feng Sun; Yang Cao; Benchen Ye; Qiliang Peng; Xingyun Liu; Bairong Shen; Hong Zhang
Journal: Database (Oxford) Date: 2018-01-01 Impact factor: 3.451

5. Potential Applications of DNA, RNA and Protein Biomarkers in Diagnosis, Therapy and Prognosis for Colorectal Cancer: A Study from Databases to AI-Assisted Verification.

Authors: Xueli Zhang; Xiao-Feng Sun; Bairong Shen; Hong Zhang
Journal: Cancers (Basel) Date: 2019-02-01 Impact factor: 6.639

6. PTSD Biomarker Database: deep dive metadatabase for PTSD biomarkers, visualizations and analysis tools.

Authors: Daniel Domingo-Fernández; Allison Provost; Alpha Tom Kodamullil; Josep Marín-Llaó; Heather Lasseter; Kristophe Diaz; Nikolaos P Daskalakis; Lee Lancashire; Martin Hofmann-Apitius; Magali Haas
Journal: Database (Oxford) Date: 2019-01-01 Impact factor: 3.451