Literature DB >> 16790057

AgdbNet - antigen sequence database software for bacterial typing.

Abstract

BACKGROUND: Bacterial typing schemes based on the sequences of genes encoding surface antigens require databases that provide a uniform, curated, and widely accepted nomenclature of the variants identified. Due to the differences in typing schemes, imposed by the diversity of genes targeted, creating these databases has typically required the writing of one-off code to link the database to a web interface. Here we describe agdbNet, widely applicable web database software that facilitates simultaneous BLAST querying of multiple loci using either nucleotide or peptide sequences.
RESULTS: Databases are described by XML files that are parsed by a Perl CGI script. Each database can have any number of loci, which may be defined by nucleotide and/or peptide sequences. The software is currently in use on at least five public databases for the typing of Neisseria meningitidis, Campylobacter jejuni and Streptococcus equi and can be set up to query internal isolate tables or suitably-configured external isolate databases, such as those used for multilocus sequence typing. The style of the resulting website can be fully configured by modifying stylesheets and through the use of customised header and footer files that surround the output of the script.
CONCLUSION: The software provides a rapid means of setting up customised Internet antigen sequence databases. The flexible configuration options enable typing schemes with differing requirements to be accommodated.

Entities: Disease Species

Mesh：

Substances：
Peptides

Year: 2006 PMID： 16790057 PMCID： PMC1543660 DOI： 10.1186/1471-2105-7-314

Source DB: PubMed Journal: BMC Bioinformatics ISSN： 1471-2105 Impact factor: 3.169

Background

The wide availability of molecular techniques, especially high-throughput nucleotide sequence determination, has enabled various typing schemes that were initially based on the reaction of bacterial surface proteins with immunological reagents to be redefined on the basis of the deduced peptide sequences of the variants targeted. This paradigm shift has generated a need to make variant sequences publicly available to facilitate the identification of known variants and ensure the integrity of a unified nomenclature system. Web-accessible databases that archive nucleotide or peptide sequence data are an ideal means of achieving this. A challenge for the design of generic software for such databases is presented by the fact that schemes vary in the way that variants are defined and in the number of loci that may make up a 'strain' definition. For example, some schemes involve the identification of short peptides located in one or more surface-exposed loops of an antigen [1,2]; whereas others may use larger nucleotide sequences [3] or indeed peptide sequences often in conjunction with corresponding nucleotide sequences [4,5]. For all typing schemes, it is essential that there is broad acceptance on the definition of variants and a central repository of variant designations needs to be maintained and curated for accuracy. This is preferable to the deposition of a variant sequence in an archival database such as Genbank, where no checks are made on sequence quality and the submitter is free to define a variant as they may wish. Because of the variation in schemes, setting up specialised databases usually requires bespoke code to be written for the interfaces between the web server and database engine. Here we describe a configurable software package that enables the rapid construction of these types of sequence databases, allowing queries with either nucleotide or peptide sequences, multiple loci to be queried together and the sequences to be made available for download.

Implementation

The agdbNet package runs on Linux or UNIX systems using the PostgreSQL database and Apache web server. The core software is written in Perl as a single, mod_perl compatible, CGI web script that interfaces with BLAST [6]. BLAST is an essential component of the system, but other applications may be optionally installed to enhance functionality; for example, EMBOSS [7] is used to generate sequence alignments of nearest alleles and peptides, and Bioperl [8] allows sequences to be downloaded in multiple formats. A configuration file defines the paths for BLAST and the other helper applications, working directories and site-wide options. The software uses XML configuration files to describe the structures of individual databases. The XML parsing functionality was derived from code written for use with multilocus sequence typing databases [9,10]. Every database XML file has a tag that contains database-specific configuration options such as the name of the database, the local path to the web root and a text description of the database. There will also be at least one set of tags, enclosing either or tags (or both) that describe sequence tables. Any number of fields may be defined within these tables, and options set for whether they are displayed in the main results table following a query. Databases can also contain an isolate table containing information about representative or reference isolates that exhibit a given antigen. It is also possible to define an external isolate database table that can be queried for a matching antigen. Database searches on external systems require the remote system to be configured to allow connections on the PostgreSQL port and remote queries to the particular database in question. In order to add to and edit the database, a Perl script is provided to run a private web interface for the curator. The interface enables sequences to be added rapidly and automatically performs a data integrity check. The curator's interface script reads the same XML file as the main website script, so that any modifications are kept in sync. The curator can run an arbitrary script on the system by activating a button on the curator's interface, if the script's path has been defined in the XML file. This script enables the updating of static web pages from the database, for instance, without requiring the curator to have administrator access to the system. The software produces standards-compliant XHTML and uses cascading style sheets (CSS) so that the style of the resulting website can be modified easily. Additionally, header and footer HTML files can be defined that will be added to the resulting pages so that they can conform to the layout of a particular website, enabling the look-and-feel to be modified easily.

Results and discussion

Public databases using this software

The software is in use on a number of public bacterial typing databases. The first site to be implemented was the PorA variable region database for subtyping Neisseria meningitidis [1,11], a major cause of meningitis and septicaemia. The PorA protein is a major typing target and vaccine candidate. This scheme defines the peptide variants at two variable regions (VR1 and VR2). Either nucleotide or peptide sequences can be queried against both loci, either singly or, more usually, together. If a variant is identified, a hyperlink will lead to a page describing all the information known about it, including antibody reactivities, Genbank and PubMed accession numbers and links, and the submitter information (figure 1). Along with the peptide information, a table listing known isolates expressing the variant is shown. Further information about the isolates can be displayed by following the hyperlinks from this table. The software will also query the external PubMLST isolate database [10,12] listing isolates from it that also match [see Additional file 1: poravr.xml for the XML description of this database].

Figure 1

Screenshot: . Results following a hyperlink for a particular variant sequence. The resulting page lists all known information about the variant, such as who first reported it, where it has been published, accession numbers and its monoclonal antibody reactivity. Isolates stored within the PorA database that express the variant are shown and then the results of a similar search against the external PubMLST isolate database.

Databases for other Neisseria antigens are also available [11]: i) a nucleotide database for the two different classes of the typing antigen PorB [3]; ii) A variable region peptide database for a putative vaccine candidate, FetA [2,3]. A database containing both alleles and peptides for the short variable region of the FlaA typing antigen of Campylobacter, an organism frequently implicated in cases of food poisoning, is also available [4,13]. Investigating the diversity in the FlaA protein, coupled to broader typing methods, can enhance the discrimination of isolates during outbreak investigations. Recently, a database for a sensitive subtyping scheme for Streptococcus equi, the causative agent of strangles in horses, has been set up that indexes the variation found in the SeM protein [5,14] (figures 2 and 3). Use of this scheme has been used to investigate potential cases of disease related to administration of live attenuated S. equi vaccine.

Figure 2

Screenshot: Querying the . A nucleotide sequence has been pasted in to the web form and the selection has been made to query against both the nucleotide and peptide sequence tables.

Figure 3

Screenshot: Results of a search of the . The software has identified that the query sequence is not known but is most similar to allele 3. It then shows the nucleotide differences. The BLAST search against the peptide table has produced a number of partial matches. The alignment output from BLAST can be viewed by clicking the appropriate hyperlink. All allele and peptide numbers are hyperlinked to lead to more detailed information about the sequence.

Interconnected distributed databases

Because databases hosted using this software share a common platform, it makes it practical to retrieve information from them by other websites, creating a network of interconnected distributed databases. This can be seen in practice on the multilocus sequence typing (MLST) databases for Neisseria [10,12]. If an isolate has been genetically subtyped, the MLST database software will automatically query the PorA variable region database and display a hyperlinked peptide that takes the user to a page on the PorA website describing that peptide. This interconnection works both ways as the PorA website can also query pubmlst.org to list isolates that contain a particular subtype. These interconnections between databases can be configured in the software by a single line in the XML description.

Conclusion

This software enables the rapid construction of web-based antigen databases. These databases can contain multiple sets of nucleotide or peptide sequences, or both, and may be queried using nucleotide or peptide sequences. Multiple loci may be queried simultaneously, an advantage for typing schemes that involve separate variable regions that may be located within a single larger sequence. The software has been successfully deployed in a number of applications which are being used daily by the worldwide public health and research communities.

Availability and requirements

Project name: AgdbNet Project home page: Operating systems: Linux/UNIX Programming language: Perl Other requirements: Apache; PostgreSQL; CGI, DBI, XML::Parser::perlSAX Perl modules; BLAST License: GNU GPL Any restrictions to use by non-academics: none A distribution archive of the software (version 1.0.0) is available with this manuscript [see Additional file 2].

Authors' contributions

KAJ carried out the programming work and drafted the manuscript. MCJM conceived the software development and participated in defining its specification. Both authors read and approved the final manuscript.

Additional File 1

XML (text) file showing the configuration for the Neisseria PorA VR database. Click here for file

Additional File 2

Distribution archive of the software (version 1.0.0). Click here for file

10 in total

1. EMBOSS: the European Molecular Biology Open Software Suite.

Authors: P Rice; I Longden; A Bleasby
Journal: Trends Genet Date: 2000-06 Impact factor: 11.639

2. Database-driven multi locus sequence typing (MLST) of bacterial pathogens.

Authors: M S Chan; M C Maiden; B G Spratt
Journal: Bioinformatics Date: 2001-11 Impact factor: 6.937

3. The Bioperl toolkit: Perl modules for the life sciences.

Authors: Jason E Stajich; David Block; Kris Boulez; Steven E Brenner; Stephen A Chervitz; Chris Dagdigian; Georg Fuellen; James G R Gilbert; Ian Korf; Hilmar Lapp; Heikki Lehväslaiho; Chad Matsalla; Chris J Mungall; Brian I Osborne; Matthew R Pocock; Peter Schattner; Martin Senger; Lincoln D Stein; Elia Stupka; Mark D Wilkinson; Ewan Birney
Journal: Genome Res Date: 2002-10 Impact factor: 9.043

4. Distribution of surface protein variants among hyperinvasive meningococci: implications for vaccine design.

Authors: Rachel Urwin; Joanne E Russell; Emily A L Thompson; Edward C Holmes; Ian M Feavers; Martin C J Maiden
Journal: Infect Immun Date: 2004-10 Impact factor: 3.441

Review 5. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors: S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal: Nucleic Acids Res Date: 1997-09-01 Impact factor: 16.971

6. Sequence variation of the SeM gene of Streptococcus equi allows discrimination of the source of strangles outbreaks.

Authors: Charlotte Kelly; Maxine Bugg; Carl Robinson; Zoe Mitchell; Nick Davis-Poynter; J Richard Newton; Keith A Jolley; Martin C J Maiden; Andrew S Waller
Journal: J Clin Microbiol Date: 2006-02 Impact factor: 5.948

7. Sequence typing and comparison of population biology of Campylobacter coli and Campylobacter jejuni.

Authors: Kate E Dingle; Frances M Colles; Daniel Falush; Martin C J Maiden
Journal: J Clin Microbiol Date: 2005-01 Impact factor: 5.948

8. Antigenic diversity of meningococcal enterobactin receptor FetA, a vaccine component.

Authors: Emily A L Thompson; Ian M Feavers; Martin C J Maiden
Journal: Microbiology Date: 2003-07 Impact factor: 2.777

9. PorA variable regions of Neisseria meningitidis.

Authors: Joanne E Russell; Keith A Jolley; Ian M Feavers; Martin C J Maiden; Janet Suker
Journal: Emerg Infect Dis Date: 2004-04 Impact factor: 6.883

10. mlstdbNet - distributed multi-locus sequence typing (MLST) databases.

Authors: Keith A Jolley; Man-Suen Chan; Martin C J Maiden
Journal: BMC Bioinformatics Date: 2004-07-01 Impact factor: 3.169

10 in total

1. Multilocus sequence typing of total-genome-sequenced bacteria.

Authors: Mette V Larsen; Salvatore Cosentino; Simon Rasmussen; Carsten Friis; Henrik Hasman; Rasmus Lykke Marvig; Lars Jelsbak; Thomas Sicheritz-Pontén; David W Ussery; Frank M Aarestrup; Ole Lund
Journal: J Clin Microbiol Date: 2012-01-11 Impact factor: 5.948

Review 2. Microbial sequence typing in the genomic era.

Authors: Marcos Pérez-Losada; Miguel Arenas; Eduardo Castro-Nallar
Journal: Infect Genet Evol Date: 2017-09-21 Impact factor: 3.342

3. Ion torrent personal genome machine sequencing for genomic typing of Neisseria meningitidis for rapid determination of multiple layers of typing information.

Authors: Ulrich Vogel; Rafael Szczepanowski; Heike Claus; Sebastian Jünemann; Karola Prior; Dag Harmsen
Journal: J Clin Microbiol Date: 2012-03-29 Impact factor: 5.948

4. Campylobacter jejuni and Campylobacter coli genotyping by high-resolution melting analysis of a flaA fragment.

Authors: Shreema Merchant-Patel; Patrick J Blackall; Jillian Templeton; Erin P Price; Steven Y C Tong; Flavia Huygens; Philip M Giffard
Journal: Appl Environ Microbiol Date: 2009-11-20 Impact factor: 4.792

5. Description of an unusual Neisseria meningitidis isolate containing and expressing Neisseria gonorrhoeae-Specific 16S rRNA gene sequences.

Authors: Marion Walcher; Rhonda Skvoretz; Megan Montgomery-Fullerton; Vivian Jonas; Steve Brentano
Journal: J Clin Microbiol Date: 2013-07-17 Impact factor: 5.948

Review 6. MLST revisited: the gene-by-gene approach to bacterial genomics.

Authors: Martin C J Maiden; Melissa J Jansen van Rensburg; James E Bray; Sarah G Earle; Suzanne A Ford; Keith A Jolley; Noel D McCarthy
Journal: Nat Rev Microbiol Date: 2013-09-02 Impact factor: 60.633

7. BIGSdb: Scalable analysis of bacterial genome variation at the population level.

Authors: Keith A Jolley; Martin C J Maiden
Journal: BMC Bioinformatics Date: 2010-12-10 Impact factor: 3.169

8. Molecular epidemiology of meningococcal disease in England and Wales 1975-1995, before the introduction of serogroup C conjugate vaccines.

Authors: Joanne E Russell; Rachel Urwin; Stephen J Gray; Andrew J Fox; Ian M Feavers; Martin C J Maiden
Journal: Microbiology (Reading) Date: 2008-04 Impact factor: 2.777

9. EpiScanGIS: an online geographic surveillance system for meningococcal disease.

Authors: Markus Reinhardt; Johannes Elias; Jürgen Albert; Matthias Frosch; Dag Harmsen; Ulrich Vogel
Journal: Int J Health Geogr Date: 2008-07-01 Impact factor: 3.918

10. Open-access bacterial population genomics: BIGSdb software, the PubMLST.org website and their applications.

Authors: Keith A Jolley; James E Bray; Martin C J Maiden
Journal: Wellcome Open Res Date: 2018-09-24

10 in total