Literature DB >> 16381896

ShiBASE: an integrated database for comparative genomics of Shigella.

Jian Yang1, Lihong Chen, Jun Yu, Lilian Sun, Qi Jin.   

Abstract

Among the major enteric bacterial pathogens, Shigella is found to display extreme genome diversity and dynamics, which imposes a challenge in comparative genomic studies. To facilitate further studies in this area, we have constructed an integrated online database, ShiBASE (http://www.mgc.ac.cn/ShiBASE/),which contains Shigella genomic sequences of four species and additional comparative genomic hybridization (CGH) data of 43 serotypes. ShiBASE offers online comparative analysis on DNA sequences, gene orders, metabolic pathways and virulence factors. In addition, ShiBASE has a newly developed online comparative visualization service, Shi-align, which enables the alignment of any query sequence with the reference genome sequences.

Entities:  

Mesh:

Year:  2006        PMID: 16381896      PMCID: PMC1347396          DOI: 10.1093/nar/gkj033

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Shigella species are the causative agents of bacillary dysentery or shigellosis, which remains a threat to public health worldwide, particularly in developing countries. Estimated shigellosis episodes are 160 million per annum, with 1.1 million deaths, the majority of which are children younger than 5 years of age (1). The clinical manifestation of shigellosis range from mild diarrhoea to severe forms of the disease which includes fever, abdominal cramps and blood, and pus and mucus in the stools (2). Much of Shigella pathogenesis seems to be the result of multiple effects of its plasmid-borne type III secretion system (TTSS) encoded by mxi and spa genes. The TTSS secretes many effector proteins, including IpaA, IpaB, IpaC and IpgD, which can mediate epithelial signalling events, cytoskeletal rearrangements and other actions (3). The plasmid also encodes a 120 kDa outermembrane protein called IcsA, responsible for actin-based motility leading to intra- and inter-cellular spread of the bacteria by the binding of N-WASP (4). Furthermore, in addition to the virulence plasmid, many chromosomal loci, such as Shigella pathogenicity islands SHI-1 and SHI-2 also contribute to virulence (5,6). For a better understanding of the pathogenesis and epidemiology of Shigella infection, comparative genomics has emerged to be an important area of research. Based on their somatic O-antigen, Shigella is divided into four species: S.dysenteriae, S.flexneri, S.boydii and S.sonnei. Our group and others have determined the complete genome sequences of five Shigella strains that cover all four species (7–9). In addition, comparative genomic hybridization (CGH) microarray analysis has been performed on 43 Shigella strains of different serotypes (J. Peng, X. Zhang, J. Yang, J. Wang, E. Yang, W. Bin, C. Wei, M. Sun and Q. Jin, manuscript in preparation), which revealed extensive diversity and dynamics between Shigella genomes. Hence, an integrated database capable of dealing with the complex genome comparison is urgently needed in addition to currently available online database, et al.,oliBASE (10), which is limited within pairwise genome comparison. We present here ShiBASE (), a freely accessible online resource that is focused on the comparative genomics of Shigella. ShiBASE is able to summarize large volumes of genomic sequence and comparative genomic data in a visually intuitive format. ShiBASE also provides a novel Shi-align service for visualizing BLAST hits between query sequences and known Shigella genomes, thus allowing rapid examination of synteny, genomic rearrangements and putative genomic islands.

DATABASE CONTENT AND CONSTRUCTION

There are five completed Shigella genomes available. The S.flexneri 2a strain 2457T was excluded from the dataset of ShiBASE owing to its genome being highly similar to another serotype 2a strain, 301 (Sf301). Variation between the genomes of 2457T and Sf301 is limited to the quantity of IS elements and some single-nucleotide differences (8). The four Shigella genomes included in the current release of ShiBASE are S.dysenteriae 1 strain 197 (Sd197), S.boydii 4 strain 227 (Sb227), S.sonnei strain 046 (Ss046) and S.flexneri 2a strain 301. The genome sequences and annotation files from GenBank (11) were converted to tabular files by using the Bioperl toolkit (12). Genes from Escherichia coli K12 strain MG1655 (accession no. U00096) were used as references for defining orthologue groups. Pairwise orthologues were identified by INPARANOID program with default parameters (13), and were further inspected manually. Multi-copy genes related to mobile DNA, such as IS elements and bacteriophage were excluded. Protein sequences from each of the orthologue groups were processed in ClustalW (14) allowing for alignments and guide trees. Whole genome sequence comparisons were performed by BLASTN (15) and the information for each maximal segment pair (MSP) was deposited into the database for web illustration. The enzyme commission information of each genome as well as metabolic pathway maps were retrieved from the KEGG ftp server (16), and then revised by manual curation for dynamic web presentation. ShiBASE was built on a RedHat Linux 9.0 operation system, and MySQL was used to construct the database for storage of information, including genome annotations, orthologue groups, sequence comparisons and CGH results. The Perl programming language and several modules (DBI, GD and CGI) were used to generate dynamic web pages. Client-side JavaScript and Java applet were also applied in some cases for data presentation purposes.

COMPARATIVE GENOMICS IN ShiBASE

The interspecies comparison of Shigella genomes may be carried out at several different levels within ShiBASE. One may compare basic genome features, such as genome size, IS elements, genome structure (inversions and deletions) and order of orthologous genes. One can also compare metabolic pathways and virulence factors among Shigella genomes. Moreover, the CGH data produced from analyses on 43 different serotypes of Shigella strains are also integrated within ShiBASE. By using the Shi-align visualization platform one can also perform comparative analysis on any given query sequences.

Sequence-based comparison

All Shigella and E.coli strains sequenced to date are closely related and share an essentially collinear common ‘backbone’ genome sequence. However, there are unique DNA sequences present in each of the Shigella and E.coli chromosomes (7). For graphically viewing a selected region across all genomes, we have set a clickable web page in the ShiBASE interface. Figure 1A shows a comparison of a 20 kb segment involved in flagella biogenesis from all Shigella genomes. The reference genome (from Sb227 in this case) is presented on the top and the query genome sequences are aligned below. The pre-computed MSPs from BLASTN are indicated by red bars between each pair, and additionally are equipped with popup messages that provide details of the sequence alignment. Genome annotations, including genes, stable RNAs and IS elements are also graphically illustrated for each region of synteny. The comparison range can be increased in size (up to 100 kb) and the window can slide along each genome for simple browsing. If necessary users can reverse and display the complementary strand (see Sf301 in Figure 1A).
Figure 1

Sequence-based comparison in ShiBASE. (A) Shigella genome comparison view. Genome sequence of Sb227 was set as reference on the top. The current comparison window size is 20 kb and it can be increased by clicking the circular plus (‘+’) icon on the top left. Note that the reverse complementary strand of Sf301 was used in this comparison figure. Arrows represent genes, of which those with write cross or with dashed lines that connect separate portions are pseudogenes owing to mutations or insertions, respectively. The compared region contains a set of flagellar genes (fli) and they were inactivated by different pseudogenes in all four genomes. The left part of the corresponding region in Ss046 (on the bottom) was translocated elsewhere. (B) Result of Shi-align by querying with a ∼17 kb segment from EHEC. Coloured bars between each pair represent MSPs from BLASTN of different identity values: red, >95%; pink, 90–95%; and orange, <90%.

As shown in Figure 1A, there are different mutations in the fli genes across all Shigella strains, which explain genetically why members of the Shigella species are non-motile. An additional surface structure, namely fimbriae, is also eliminated from the members of the Shigella species by a different gene inactivation (data not shown). These are both examples of the genome interrogation that one may perform using ShiBASE, thus adding insight into the biology and the evolutional history of the Shigella species. ShiBASE also allows investigation of variation in genome structure at the amino acid level by gene order comparison, which is particularly useful for detecting orthologous genes. Orthologues are genes that are evolved vertically from a single ancestral gene in different genomes, and they often retain similar biological functions in the present-day organisms (13). Generally, protein sequence comparison is more sensitive because codon usage often leads to DNA sequence variations. For example, although the coding sequences of fliC show little similarity among Shigella genomes (Figure 1A) their protein sequences bare 46–62% in overall identity. In order to provide a uniform interface, the comparison of orthologues order was presented in a similar style as the comparison of DNA sequence described above, except that red bars were used here to link orthologue pairs. Multi-alignment and guide tree created by ClustalW for each orthologue group are directly linked from the individual graphic comparison pages.

Function-based comparison

Analysis of metabolic pathways is useful in illuminating the complexity of the Shigella metabolic patterns for the explanation of biochemical characteristics that distinguish Shigella from other enteric bacteria. By adopting the KEGG pathway maps, which offer graphic representations for most metabolic pathways (16), we have generated a comparative pathway mapviewer. It combines individual pathways of E.coli K12 MG1655 and Shigella strains and displays the results in one map. For instance, Shigella strains do not synthesize lysine decarboxylase since the absence of lysine decarboxylase activity is important to their virulent lifestyle (17). These data can be easily accessed and browsed on our web page. Another example is given in Supplementary Figure 1, which is the comparative view of the tyrosine metabolism pathway. It shows that Sf301, Sb227 and Ss046 possess the hpa operon, which offers them the ability to catabolize the aromatic compound 4-hydroxyphenylacetate (4-HPA), whereas Sd197 is lacking the hpa operon (18). Although the four species of Shigella were originally divided on the basis of O-specific polysaccharide of the LPS, they additionally demonstrate distinctive differences in their pathological and epidemiological features. Furthermore, S.dysenteriae serotype 1 solely possesses the cytotoxic Shiga toxin and causes disease with neurological and renal complications (19). To highlight the diversity of virulence factors presented in each genome, we collated all known and putative virulence factors in a table, which links each virulence gene to detailed pages, allowing for further analysis or DNA/protein sequence retrieval.

Extended genome content comparison

In order to extend genomic comparison beyond genome sequences, we have integrated the newly available CGH data into ShiBASE. The CGH analysis includes 43 Shigella strains of different serotypes. The DNA microarray used to generate these data contains 5051 non-redundant genes from E.coli K12 MG1655 and all known Shigella genomes. Users may browse or search the CGH data and display the results in serological or clustering order. Recent phylogenic analyses on eight house keeping genes have grouped all Shigella strains into three main clusters with five outliers (20). Since use of the CGH results has generated a similar clustering order, the information is beneficial in demonstrating a correlation between genome structure and phylogenetic relationship. In order to facilitate further comparative analysis on the CGH results, an auto filter was set up to offer users the potential to rapidly examine commonly shared or lost genes within each cluster or serogroup. For example, by screening the CGH results, a total of 228 genes present in the other S.flexneri strains are absent from S.flexneri 6. Most of the absent genes are related to LPS biosynthesis (such as rfbEFGIJ), outer membrane proteins (such as ompG, fhuA and nmpC) or enzymes (such as cai operon necessary for carnitine metabolism). Whereas S.flexneri 6 solely possesses the gsp genes encoding a novel type II secretion system. These findings may offer some reasoning to why S.flexneri 6 behaves differently from other S.flexneri strains and lies within a different cluster.

Online sequence comparison visualization service Shi-align

BLAST is a very powerful tool for finding sequence similarities and it has been widely used in database interrogation and sequence comparisons. However, when performing long sequence alignments, the voluminous and complex BLAST textual output is often difficult for biologists to read and analysis. Hence, several stand-alone programs, such as ACT () and GenomeComp (21), have been generated to tackle these issues. In line with these programs, we have developed an online tool named Shi-align. Shi-align allows the visualization of sequence comparisons, and is incorporated within ShiBASE. Shi-align permits users to compare any sequence fragments with known Shigella genomes and in turn generate a graphic alignment view (Figure 1B) as an alternative to the textual output of BLAST. This tool is particularly useful for viewing the overall organization between query sequences and known Shigella genome sequences. Figure 1B shows a ∼17 kb segment from enterohaemorrhagic E.coli (EHEC) encoding a putative adhesion aligned with corresponding sequences from various Shigella genomes in Shi-align. This graphical alignment demonstrates that sequences from Sb227 and Sf301 with the exception of IS insertions are essentially collinear with that of EHEC. Whereas the alignment sequences from Ss046 and Sd197 have obvious DNA rearrangements with respect to the EHEC sequence.

CONCLUSIONS AND FUTURE DIRECTIONS

As the developments of sequencing techniques and ubiquity of fast computers have led microbial genomics to the genus scale (22), resources at genus scale are also on demand. ShiBASE as well as other recently developed databases, such as coliBASE (10) and MolliGen (23), are all examples that aim to fulfil the requirement. The current release of ShiBASE is dedicated to be a comprehensive online resource of Shigella and offers a platform for further comparative genomics studies on this important human pathogen. Nevertheless, the close genetic relationship between Shigella and E.coli is already well established. Moreover, an E.coli pathovar, named enteroinvasive E.coli (EIEC), is biochemically, genetically and pathogenically closely related to Shigella spp. EIEC is considered as an intermediate type in evolution between E.coli and Shigella. So, in the future, ShiBASE will not only include more data of Shigella (such as proteomics data) but also collect genomes of all types of pathogenic E.coli, especially EIEC.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR online.
  23 in total

Review 1.  Microbes and microbial toxins: paradigms for microbial-mucosal interactions III. Shigellosis: from symptoms to molecular pathogenesis.

Authors:  P J Sansonetti
Journal:  Am J Physiol Gastrointest Liver Physiol       Date:  2001-03       Impact factor: 4.052

2.  MolliGen, a database dedicated to the comparative genomics of Mollicutes.

Authors:  Aurélien Barré; Antoine de Daruvar; Alain Blanchard
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

3.  Genomics at the genus scale.

Authors:  Jacques Ravel; Claire M Fraser
Journal:  Trends Microbiol       Date:  2005-03       Impact factor: 17.079

4.  "Black holes" and bacterial pathogenicity: a large genomic deletion that enhances the virulence of Shigella spp. and enteroinvasive Escherichia coli.

Authors:  A T Maurelli; R E Fernández; C A Bloch; C K Rode; A Fasano
Journal:  Proc Natl Acad Sci U S A       Date:  1998-03-31       Impact factor: 11.205

5.  Global burden of Shigella infections: implications for vaccine development and implementation of control strategies.

Authors:  K L Kotloff; J P Winickoff; B Ivanoff; J D Clemens; D L Swerdlow; P J Sansonetti; G K Adak; M M Levine
Journal:  Bull World Health Organ       Date:  1999       Impact factor: 9.408

6.  coliBASE: an online database for Escherichia coli, Shigella and Salmonella comparative genomics.

Authors:  Roy R Chaudhuri; Arshad M Khan; Mark J Pallen
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

Review 7.  Shigella flexneri infection: pathogenesis and vaccine development.

Authors:  Amy V Jennison; Naresh K Verma
Journal:  FEMS Microbiol Rev       Date:  2004-02       Impact factor: 16.408

8.  Genome dynamics and diversity of Shigella species, the etiologic agents of bacillary dysentery.

Authors:  Fan Yang; Jian Yang; Xiaobing Zhang; Lihong Chen; Yan Jiang; Yongliang Yan; Xudong Tang; Jing Wang; Zhaohui Xiong; Jie Dong; Ying Xue; Yafang Zhu; Xingye Xu; Lilian Sun; Shuxia Chen; Huan Nie; Junping Peng; Jianguo Xu; Yu Wang; Zhenghong Yuan; Yumei Wen; Zhijian Yao; Yan Shen; Boqin Qiang; Yunde Hou; Jun Yu; Qi Jin
Journal:  Nucleic Acids Res       Date:  2005-11-07       Impact factor: 16.971

9.  Activation of the CDC42 effector N-WASP by the Shigella flexneri IcsA protein promotes actin nucleation by Arp2/3 complex and bacterial actin-based motility.

Authors:  C Egile; T P Loisel; V Laurent; R Li; D Pantaloni; P J Sansonetti; M F Carlier
Journal:  J Cell Biol       Date:  1999-09-20       Impact factor: 10.539

10.  GenBank.

Authors:  Dennis A Benson; Ilene Karsch-Mizrachi; David J Lipman; James Ostell; David L Wheeler
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

View more
  14 in total

Review 1.  Progress and pitfalls in Shigella vaccine research.

Authors:  Eileen M Barry; Marcela F Pasetti; Marcelo B Sztein; Alessio Fasano; Karen L Kotloff; Myron M Levine
Journal:  Nat Rev Gastroenterol Hepatol       Date:  2013-02-19       Impact factor: 46.802

Review 2.  Utilization of multiple "omics" studies in microbial pathogeny for microbiology insights.

Authors:  Viroj Wiwanitkit
Journal:  Asian Pac J Trop Biomed       Date:  2013-04

3.  NtrBC and Nac contribute to efficient Shigella flexneri intracellular replication.

Authors:  Chelsea D Waddell; Thomas J Walter; Sophia A Pacheco; Georgiana E Purdy; Laura J Runyen-Janecky
Journal:  J Bacteriol       Date:  2014-05-02       Impact factor: 3.490

4.  RyhB, an iron-responsive small RNA molecule, regulates Shigella dysenteriae virulence.

Authors:  Erin R Murphy; Shelley M Payne
Journal:  Infect Immun       Date:  2007-04-16       Impact factor: 3.441

5.  Revisiting the molecular evolutionary history of Shigella spp.

Authors:  Jian Yang; Huan Nie; Lihong Chen; Xiaobing Zhang; Fan Yang; Xingye Xu; Yafang Zhu; Jun Yu; Qi Jin
Journal:  J Mol Evol       Date:  2006-12-09       Impact factor: 2.395

6.  Selection and validation of a multilocus variable-number tandem-repeat analysis panel for typing Shigella spp.

Authors:  Olivier Gorgé; Stéphanie Lopez; Valérie Hilaire; Olivier Lisanti; Vincent Ramisse; Gilles Vergnaud
Journal:  J Clin Microbiol       Date:  2008-01-23       Impact factor: 5.948

7.  The nucleotide sequence of Shigella flexneri 1A: A common Indian isolate.

Authors:  Suvidya H Ranade; Ashraf Hossani; D N Deobagkar; D D Deobagkar; B A Chopade; Pramod S Khandekar
Journal:  Indian J Clin Biochem       Date:  2009-07-09

8.  The use of comparative genomic hybridization to characterize genome dynamics and diversity among the serotypes of Shigella.

Authors:  Junping Peng; Xiaobing Zhang; Jian Yang; Jing Wang; E Yang; Wen Bin; Candong Wei; Meisheng Sun; Qi Jin
Journal:  BMC Genomics       Date:  2006-08-29       Impact factor: 3.969

9.  Genome dynamics and diversity of Shigella species, the etiologic agents of bacillary dysentery.

Authors:  Fan Yang; Jian Yang; Xiaobing Zhang; Lihong Chen; Yan Jiang; Yongliang Yan; Xudong Tang; Jing Wang; Zhaohui Xiong; Jie Dong; Ying Xue; Yafang Zhu; Xingye Xu; Lilian Sun; Shuxia Chen; Huan Nie; Junping Peng; Jianguo Xu; Yu Wang; Zhenghong Yuan; Yumei Wen; Zhijian Yao; Yan Shen; Boqin Qiang; Yunde Hou; Jun Yu; Qi Jin
Journal:  Nucleic Acids Res       Date:  2005-11-07       Impact factor: 16.971

10.  Yeast functional genomic screens lead to identification of a role for a bacterial effector in innate immunity regulation.

Authors:  Roger W Kramer; Naomi L Slagowski; Ngozi A Eze; Kara S Giddings; Monica F Morrison; Keri A Siggers; Michael N Starnbach; Cammie F Lesser
Journal:  PLoS Pathog       Date:  2007-02       Impact factor: 6.823

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.