Literature DB >> 29876371

Data on genome sequencing, analysis and annotation of a pathogenic Bacillus cereus 062011msu.

Rashmi Rathy1, Sayan Paul1, Vasanthakumar Ponesakki1, Paulkumar Kanniah1, Suriya Prabha Muthu1, Arun Arumugaperumal1, Emmanuel Joshua Jebasingh Sathiya Balasingh Thangapandi1, Subburathinam Balakrishnan1, Rajendhran Jeyaprakash2, Sudhakar Sivasubramaniam1.   

Abstract

Bacillus species 062011 msu is a harmful pathogenic strain responsible for causing abscessation in sheep and goat population studied by Mariappan et al. (2012) [1]. The organism specifically targets the female sheep and goat population and results in the reduction of milk and meat production. In the present study, we have performed the whole genome sequencing of the pathogenic isolate using the Ion Torrent sequencing platform and generated 458,944 raw reads with an average length of 198.2 bp. The genome sequence was assembled, annotated and analysed for the genetic islands, metabolic pathways, orthologous groups, virulence factors and antibiotic resistance genes associated with the pathogen. Simultaneously the 16S rRNA sequencing study and genome sequence comparison data confirmed that the strain belongs to the species Bacillus cereus and exhibits 99% sequence homo;logy with the genomes of B. cereus ATCC 10987 and B. cereus FRI-35. Hence, we have renamed the organism as Bacillus cereus 062011msu. The Whole Genome Shotgun (WGS) project has been deposited at DDBJ/ENA/GenBank under the accession NTMF00000000 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA404036(SAMN07629099)).

Entities:  

Keywords:  Abscessation; Bacillus cereus; Genome sequencing; Virulence factors

Year:  2018        PMID: 29876371      PMCID: PMC5988026          DOI: 10.1016/j.dib.2017.12.054

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table Value of the data The Bacillus cereus 062011msu is a deadly pathogenic bacterium known for causing abscess mainly in the female sheep and goat population. Hence, the genome sequence resource and their annotation details can be effectively utilized to understand the pathogenicity of the bacterium for the benefit of the farmers who rear the sheep and goat. The genome annotation data of Bacillus cereus 062011msu provided a broad overview regarding the subsystem features, metabolic pathways, orthologous groups, virulence factors and antibiotic resistant genes associated with the genome of the species. Most of the unique genes of the species were found to be clustered in ten genetic islands. In this study we provided a detailed analysis of the genes clustered on the genetic islands. The data obtained from 16S rRNA analysis and genome sequence comparison with other Bacillus species provided significant information regarding the identification and taxonomic classification of this new bacterial strain. Although according to the previous study using the partial 16S RNA sequences the pathogen was reported to be genetically similar to Bacillus anthracis [1], but the whole genome data confirmed that the strain is in fact belongs to the species Bacillus cereus and phylogenetically related with B. cereus ATCC 10987 and B. cereus FRI-35. The entire genome dataset can be utilized further for determining the genes and biochemical pathways related to the pathogenicity (abscess) of the strain and developing new antimicrobial drugs for the pathogen.

Data

The overall data represents the genome sequencing, assembly, annotation and comparative analysis of pathogenic bacteria Bacillus cereus 062011msu. Table 1 denotes the summary statistics of the draft genome assembly of the B. cereus 062011msu. The data describing the length and Phred quality score distribution of the raw and filtered reads are illustrated in Supplementary Fig. S1. Data on Fig. 1 represent 10 genetic islands predicted in the genome of the isolate. The details of the genes clustered on the genetic islands are shown in Supplementary Table S1. Fig. 2 shows the subsystem distribution of the B. cereus 062011msu genome based on RAST genome annotation. The complete list of the RAST annotated genes is given in Supplementary Table S2. Fig. 3 gives a complete overview of the KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways associated with the annotated genome sequence. The data illustrated in Fig. 4 show the Clusters of Orthologous Groups (COG) distribution of the protein coding genes obtained from RAST annotation. Table 2 denotes the list of virulence factors (with complete homology) identified in the annotated genome dataset. The data regarding the antibiotic resistance genes identified in the pathogen are given in Supplementary Table S3. The top 20 closest neighboring strains of Bacillus cereus 062011msu based on RAST annotation are listed in Supplementary Table S4. Supplementary Table S5 portrays the top 10 species showing maximum sequence homology with the genome of Bacillus cereus 062011msu resulted from BLAST genome alignment. Fig. 5 represents the phylogenetic tree constructed based on 16S rRNA comparison of the strain with its closely related homologs. Simultaneously the phylogenetic analysis data obtained from 23S rRNA comparison study are depicted in Fig. S2. Data on Fig. 6 represents the visualization of the annotated circular genome map of B. cereus 062011msu obtained from DNAPlotter.
Table 1

Summary statistics of Bacillus cereus 062011msu genome assembly.

No. of raw reads458,944
average quality PHRED-score Q20 (before trimming)26,143 (abs)
average quality PHRED-score Q25 (before trimming)45,447 (abs)
average quality PHRED-score Q30 (before trimming)21,587 (abs)
No. of clean reads432,619
Average length of clean reads178.08
average quality PHRED-score Q20 (after trimming)5096 (abs)
average quality PHRED-score Q25 (after trimming)54,520 (abs)
average quality PHRED-score Q30 (after trimming)29,972 (abs)
Percentage of clean reads94.27%
Total no. of assembled contigs3200
Mean contig length1611
N50 length2922
N25 length5365
Sequencing depth15X
GC%35.3%
Size of genome5,154,790

Abs: number of sequences observed at that quality score.

Fig. 1

Genetic islands in the Bacillus cereus 062011msu. Total 10 genetic islands were predicted by using the Island Viewer 4.

Fig. 2

Subsystem distribution of Bacillus cereus 062011msu genome based on RAST annotation server.

Fig. 3

Pie chart representing the distribution of KEGG pathways associated with the genome of Bacillus cereus 062011msu. The pathways were obtained by annotating the protein coding sequences against the KEGG database.

Fig. 4

Clusters of Orthologous Groups (COG) distribution of the protein coding genes in Bacillus cereus 062011msu obtained from EggNog database.

Table 2

List of virulence factors identified in the annotated genome dataset of Bacillus cereus 062011msu.

SeqNameStartStopStrandDescriptionLengthe-ValueSim mean
contig_10172401Flagellum-specific ATP synthase2406.88E−53100
contig_101714661305Flagellar assembly protein H1623.05E−15100
contig_101729841980Flagellar motor switch protein G10050100
contig_101734532998Flagellar MS-ring protein4561.33E−40100
contig_10261141106+UDP-glucose 4-epimerase9930100
contig_10501366557UTP-glucose-1-phosphate uridylyltransferase8100100
contig_105216216Flagellar biosynthesis protein FliR1471.96E−12100
contig_1052305192Flagellar biosynthesis protein FliQ1144.26E−20100
contig_10521144503Flagellar biosynthesis protein FliP6421.91E−130100
contig_105217921601Flagellar motor switch protein FliN1927.43E−39100
contig_110828837+Flagellar motor protein MotB8106.97E−171100
contig_11403011116+Flagellar biosynthesis regulator FlhF8160100
contig_114011591497+Flagellar basal body rod protein FlgG3396.74E−76100
contig_1208569387Flagellar hook protein FlgE1835.15E−35100
contig_120817521633Flagellar basal body rod modification protein1201.19E−10100
contig_123513726+Immune inhibitor A metalloprotease7149.71E−152100
contig_1277720535Glycosyl transferase, group 2 family protein1868.07E−19100
contig_132117471574Chemotaxis protein CheV2371.09E−50100
contig_1339805488Nonhemolytic enterotoxin NHE1442.77E−25100
contig_1358522379Thiol-activated cytolysin8370100
contig_16135471092+Caspsular polysaccharide biosynthesis protein2644.12E−52100
contig_163491276Hypothetical protein NEAT-type hemophore-mediated heme uptake system5854.82E−92100
contig_167339293Flagellar hook-basal body protein FliE3001.15E−69100
contig_1673834538Flagellar basal body rod protein FlgC2971.71E−68100
contig_191226137Non-hemolytic enterotoxin A2252.03E−42100
contig_1926852496Flagellar protein FliS, putative3572.32E−74100
contig_19261621977Flagellar capping protein6454.44E−148100
contig_202816721Flagellar motor protein MotS1471.05E−29100
contig_20281158886Flagellar motor protein MotP2731.14E−33100
contig_20371066857Chemotaxis protein methyltransferase CheR2103.60E−45100
contig_2173701471Flagellar biosynthesis protein FliR2317.44E−40100
contig_222138262+Hemolysin III2258.80E−48100
contig_233936429Glycosyl transferase, group 1 family protein3365.30E−77100
contig_2555475242Flagellar hook-associated protein FlgL2342.66E−48100
contig_2723583191O-antigen polymerase wzy3935.21E−71100
contig_351687995UDP-galactose phosphate transferase6931.77E−168100
contig_3523791687Aminotransferase family protein6936.51E−174100
contig_43946725061+Transcriptional regulator PlcR, putative3903.32E−90100
contig_57230688NAD dependent epimerase/dehydratase family protein2192.36E−44100
contig_5721657284UDP-glucose 6-dehydrogenase13740100
contig_57224221751Polysaccharide transport protein, putative6729.86E−147100
contig_582743630Flagellar hook-associated protein FlgK1142.91E−19100
contig_68154545Internalin, putative5011.98E−116100
contig_69457675450Channel protein, hemolysin III family3185.60E−57100
contig_7161529Cytotoxin K1444.42E−24100
contig_7341083232Phospholipase C8520100
contig_87511561413+Iron compound ABC transporter iron compound-binding protein2583.82E−40100
contig_87524462847+Iron compound ABC transporter permease protein4028.95E−50100
contig_876733329Capsular exopolysaccharide family protein4051.42E−75100
contig_88230222846Flagellar motor switch protein1771.30E−33100
contig_88249734479Chemotaxis histidine kinase4955.15E−94100
contig_89978263Capsular polysaccharide biosynthesis protein7206.46E−178100
contig_89913831264Tyrosine-protein kinase1209.58E−21100
contig_90324071010Flagellin13980100
contig_96515983+Non-hemolytic enterotoxin C9690100
contig_98611101253+Membrane-bound transcriptional regulator LytR1442.26E−10100
Fig. 5

Phylogenetic tree based on 16S rRNA comparison of Bacillus cereus 062011msu with its closely related homologs using the MEGA7 software.

Fig. 6

Circular genome map of Bacillus cereus 062011msu generated by DNAPlotter tool.

Genetic islands in the Bacillus cereus 062011msu. Total 10 genetic islands were predicted by using the Island Viewer 4. Subsystem distribution of Bacillus cereus 062011msu genome based on RAST annotation server. Pie chart representing the distribution of KEGG pathways associated with the genome of Bacillus cereus 062011msu. The pathways were obtained by annotating the protein coding sequences against the KEGG database. Clusters of Orthologous Groups (COG) distribution of the protein coding genes in Bacillus cereus 062011msu obtained from EggNog database. Phylogenetic tree based on 16S rRNA comparison of Bacillus cereus 062011msu with its closely related homologs using the MEGA7 software. Circular genome map of Bacillus cereus 062011msu generated by DNAPlotter tool. Summary statistics of Bacillus cereus 062011msu genome assembly. Abs: number of sequences observed at that quality score. List of virulence factors identified in the annotated genome dataset of Bacillus cereus 062011msu.

Experimental design, materials and methods

Genome sequencing, quality assessment and de novo assembly

The Bacillus cereus 062011msu was isolated from the abscess tissue of the affected female sheep and goats in Maruthamputhur village near Alangulam Region, Tirunelveli District, Tamil Nadu, India [1]. The whole genome sequencing of the species using Ion Torrent personal genome machine (Life Technologies, Carlsbad, CA) [2] produced 458,944 raw reads having average length of 198.2 bp and total size of 90,974,357 bp (90.974 MB). The FastQC (version.0.11.5) plug-in software (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) [3] and CLC genomics workbench version 9.0.1 [4] were used for analyzing the read quality and trimming of ambiguous low quality reads. After quality assessment and trimming total 432,619 cleaned reads were obtained with an average length of 178.08 bp. The trimmed reads were assembled into 3,200 contigs with an average length of 1,611 bp and GC content of 35.3% using the denovo assembly algorithm of CLC Genomics Workbench version 9.0.1.

Genome sequence annotation and genomic data analysis

The draft genome contigs of Bacillus cereus 062011 msu were annotated by using the NCBI Prokaryotic Genomes Automatic Annotation Pipeline (PGAAP) [5] and Rapid Annotation of microbial genome using Subsystem Technology (RAST) version 2.0 (http://rast.nmpdr.org/) [6]. The PGAAP annotation of the isolate's genome showed total 7,061 CDS (2,301 protein coding genes and 4,760 pseudo genes) and 81 RNA genes including 66 tRNAs, 10 rRNAs and 5 ncRNAs. The annotation details were given in whole genome shotgun (WGS) project with the project accession NTMF00000000. The genomic islands are set of genes with probable horizontal origin which facilitate in the diversification, adaptation and evolution of pathogenic microbes [7], [8]. The Genomic islands in our study were predicted by submitting the PGAAP generated GenBank file to the Island Viewer 4 (http://www.pathogenomics.sfu.ca/islandviewer/) [8]. Total 219 genes were clustered on 10 genetic islands. Simultaneously the data obtained from the RAST annotation server revealed that the draft genome contains 8721 coding sequences and 472 subsystems with “Amino Acids and Derivatives” and “Carbohydrates” were the most represented subsystem features. In addition the annotated subsystem features denoted 257 genes associated with “Virulence, Disease and Defense” including 154 genes associated with antibiotics and toxicity resistance, 52 genes associated with the synthesis of antibacterial peptides, Bacteriocins, 50 genes associated with invasion and intracellular resistance and one gene associated with adhesion. The KEGG (Kyoto Encyclopedia of Genes and Genomes) biological pathways associated with the genome of Bacillus cereus 062011msu were identified by annotating the protein coding sequences against the KEGG pathway database using the BLAST2GO program [9]. A total of 3104 sequences were mapped to 116 different KEGG pathways. Among them the pathways associated with Nucleotide metabolism, Amino acid metabolism, Metabolism of cofactors and vitamins and Carbohydrate metabolism were the most dominant KEGG pathways observed in the genome dataset. The prediction and classification of the orthologous groups associated with the Protein coding genes of Bacillus cereus 062011msu were performed by using the EggNog database (Evolutionary genealogy of genes) embedded within the BLAST2GO software [10]. The COG (Clusters of Orthologous Groups) data denoted that the cluster for “Amino acid transport and metabolism” (381 sequences) forms the largest functional group. Among the other functional groups the clusters for “Function unknown” (344 sequences), “General Function Prediction Only” (289 sequences), “Inorganic ion transport and metabolism” (281 sequences) and “Carbohydrate transport and metabolism” (261 sequences) were the highly represented categories. Emphasizing the pathogenic nature of the strain, the virulence factors and toxic genes residing in the genome of Bacillus cereus 062011msu were further screened by annotating the coding sequences against the Virulence Factor Database (VFDB) [11] using the local BLASTX with E-value cutoff of 1E-5. A total of 1108 sequences homologous to 743 virulence factors and toxic genes were identified from the BLAST search. Among them 56 genes showed complete sequence homology (100%) with the annotated genome dataset of B. cereus 062011msu, indicating that the flagellar proteins might play regulatory role in the pathogenicity of the bacterium. The previous in vitro experiments by Mariappan et al., 2012 reported that the pathogen was sensitive to tetracycline (TET) and ciprofloxacin (CPFX) [1]. The antibiotic resistance genes present in Bacillus cereus 062011msu were screened by using the curated database, Antibiotic Resistance Genes Database (ARDB) (http://ardb.cbcb.umd.edu/) [12]. The data illustrated that the pathogen consists of total 14 crucial antibiotics resistance genes exhibiting resistance to the antibiotics like bacitracin, penicillin, fosfomycin, streptogramin_a, chloramphenicol, doxorubicin, fluoroquinolone, puromycin, streptomycin, beta_lactam, lincomycin and fosmidomycin, thus confirming the susceptibility of the strain to TET and CPFX.

Genome sequence comparison, 16S and 23S rRNA analysis, and genome map visualization

The closest neighboring strains for Bacillus cereus 062011msu based on the genome sequence comparison using RAST server were identified as Bacillus cereus AND1407 (score 544), Bacillus cereus MSX-D12 (score 406) and Bacillus cereus BAG3O-2 (score 387). Based on the local similarity of the aligned nucleotide sequences using the rapid sequence comparison tool BLAST [13] the genome of Bacillus cereus 062011msu exhibited 99% sequence homology with the genomes of Bacillus cereus ATCC 10987, Bacillus cereus strain M3, Bacillus cereus FRI-35, Bacillus thuringiensis serovar finitimus YBT-020, Bacillus cereus strain CC-1, Bacillus cereus NC7401, Bacillus cereus AH187 respectively. In microbial genomics research the comparison of 16S rRNA gene sequence has emerged as a reliable technique to identify new bacterial strains associated with pathogenicity and infections [14]. The deduced 16S rRNA sequence for Bacillus cereus 062011msu genome was aligned to its nearby homologs using the Clustal W multiple sequence alignment and the phylogenetic analysis was performed through the maximum likelihood method with 100 bootstrap replicates using the MEGA7 software (www.megasoftware.net/) [15]. The phylogenetic tree based on 16S rRNA sequence comparison confirmed that the pathogenic strain belongs to the species Bacillus cereus and exhibits close evolutionary relationship with B. cereus ATCC 10987 and B. cereus FRI-35 as they were clustered together as a monophyletic clade. Simultaneously we have also performed the phylogenetic analysis based on 23S rRNA gene sequence comparison using the MEGA7 software. The 16S rRNA gene derived phylogenetic tree was found to be concordant with the 23S rRNA gene tree as it also identified B. cereus ATCC 10987 as the closest evolutionary homolog of the pathogen. The complete genomic map of Bacillus cereus 062011msu representing the GC content, GC skew graphs, coordinates and coding sequence features on both forward and reverse strands obtained from RAST annotation was generated by DNAPlotter (http://www.sanger.ac.uk/science/tools/dnaplotter) [16].
Subject areaBiology
More specific subject areaBioinformatics (Genomics)
Type of dataTable, figure
How data was acquiredGenome sequencing: Ion Torrent personal genome machine (PGM) (Life Technologies, Carlsbad, CA),
Denovo sequence assembly: CLC genomics workbench version 9.0.1,
Bioinformatics approaches: NCBI Prokaryotic Genomes Automatic Annotation Pipeline (PGAAP), RAST genome annotation server (http://rast.nmpdr.org/), EggNog database integrated in BLAST2GO (eggnog.embl.de; https://www.blast2go.com/), MEGA7 (Multiple sequence alignment and phylogenetic analysis), DNAPlotter (http://www.sanger.ac.uk/science/tools/dnaplotter).
Data formatAnalyzed
Experimental factorsGenome sequencing, genome annotation, KEGG pathway analysis, orthologous group analysis, 16S rRNA and 23S rRNA based phylogeny.
Experimental featuresThe whole genome sequencing of Bacillus cereus 062011msu was performed by using Ion Torrent Personal Genome Machine (PGM) platform. Quality analysis, filtering and de novo assembly of the raw reads were performed by CLC genomics workbench 9.0.1. Genome annotation was done by using the PGAAP pipeline and RAST genome annotation server. Multiple sequence alignment and phylogenetic analysis based on 16S rRNA and 23S rRNA sequences were performed by the MEGA7 tool. The circular genome map of our species was generated by DNAPlotter.
Data source locationMaruthamputhur village, Alangulam, Tirunelveli District, Tamil Nadu, India. (latitude: 8.8646N and longitude: 77.4960 E).
Data accessibilityGenome analysis and annotation data are given within this article and the raw data along with NCBI PGAAP annotation were deposited at NCBI repository:
https://www.ncbi.nlm.nih.gov/bioproject/PRJNA404036,
Bioproject ID: 404036, BioSample: SAMN07629099
The Whole Genome Shotgun (WGS) project has been deposited at DDBJ/ENA/GenBank under the accession NTMF00000000 (https://www.ncbi.nlm.nih.gov/nuccore/NTMF00000000)
The genome annotation data obtained from the RAST server are given in this article.
Related research article“Bacillus sp. causing abscessation in sheep and goat population” by Mariappan et al. (2012) [1].
  14 in total

1.  KEGG: kyoto encyclopedia of genes and genomes.

Authors:  M Kanehisa; S Goto
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

Review 3.  Impact of 16S rRNA gene sequence analysis for identification of bacteria on clinical microbiology and infectious diseases.

Authors:  Jill E Clarridge
Journal:  Clin Microbiol Rev       Date:  2004-10       Impact factor: 26.132

4.  MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets.

Authors:  Sudhir Kumar; Glen Stecher; Koichiro Tamura
Journal:  Mol Biol Evol       Date:  2016-03-22       Impact factor: 16.240

5.  A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers.

Authors:  Michael A Quail; Miriam Smith; Paul Coupland; Thomas D Otto; Simon R Harris; Thomas R Connor; Anna Bertoni; Harold P Swerdlow; Yong Gu
Journal:  BMC Genomics       Date:  2012-07-24       Impact factor: 3.969

6.  IslandViewer 4: expanded prediction of genomic islands for larger-scale datasets.

Authors:  Claire Bertelli; Matthew R Laird; Kelly P Williams; Britney Y Lau; Gemma Hoad; Geoffrey L Winsor; Fiona S L Brinkman
Journal:  Nucleic Acids Res       Date:  2017-07-03       Impact factor: 16.971

Review 7.  Genomic islands: tools of bacterial horizontal gene transfer and evolution.

Authors:  Mario Juhas; Jan Roelof van der Meer; Muriel Gaillard; Rosalind M Harding; Derek W Hood; Derrick W Crook
Journal:  FEMS Microbiol Rev       Date:  2008-10-29       Impact factor: 16.408

8.  ARDB--Antibiotic Resistance Genes Database.

Authors:  Bo Liu; Mihai Pop
Journal:  Nucleic Acids Res       Date:  2008-10-02       Impact factor: 16.971

9.  DNAPlotter: circular and linear interactive genome visualization.

Authors:  Tim Carver; Nick Thomson; Alan Bleasby; Matthew Berriman; Julian Parkhill
Journal:  Bioinformatics       Date:  2008-11-05       Impact factor: 6.937

10.  The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST).

Authors:  Ross Overbeek; Robert Olson; Gordon D Pusch; Gary J Olsen; James J Davis; Terry Disz; Robert A Edwards; Svetlana Gerdes; Bruce Parrello; Maulik Shukla; Veronika Vonstein; Alice R Wattam; Fangfang Xia; Rick Stevens
Journal:  Nucleic Acids Res       Date:  2013-11-29       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.