Literature DB >> 31970271

Genomic sequence data and single nucleotide polymorphism genotyping of Bacillus anthracis strains isolated from animal anthrax outbreaks in Northern Cape Province, South Africa.

Kgaugelo Edward Lekota1,2, Ayesha Hassim1, Henriette van Heerden1.   

Abstract

This report presents genomic data on sequence reads and draft genomes of Bacillus anthracis isolates from anthrax outbreaks in animals in an endemic region of South Africa as well as genotyping of the strains using canonical single nucleotide polymorphisms (canSNPs). It is derived from an article entitle "Phylogenomic structure of B. anthracis strains in the Northern Cape Province, South Africa revealed novel single nucleotide polymorphisms". Whole genome sequencing (WGS) of twenty-three B. anthracis strains isolated during 1998 and 2009 anthrax outbreaks in the Northern Cape Province (NCP), as well as a strain from Botswana (6102_6B) and one from Namibia-South Africa transfrontier conservation area (Sendlingsdrift, 6461_SP2) were obtained using both the HiSeq 2500 and MiSeq Illumina platforms. Mismatch amplification mutation assay (melt-MAMA) qPCR were used to identify the canSNP genotypes within the global population of B. anthracis. DNA sequencing data is available at NCBI Sequence Read Archive and GenBank database under accession N0. PRJNA580142 and PRJNA510736 respectively. A phylogenetic tree and CanSNP typing profiles of the isolates are presented within this article.
© 2019 The Author(s).

Entities:  

Keywords:  Bacillus anthracis; Canonical single nucleotide polymorphism (canSNP); Whole genome sequencing (WGS)

Year:  2019        PMID: 31970271      PMCID: PMC6965700          DOI: 10.1016/j.dib.2019.105040

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table The data sheds light of draft genomes and genetic diversity of B. anthracis strains from Northern Cape Province from two anthrax outbreaks during 1998 and 2009 in South Africa. The data serve as a benchmark for other researchers to determine the evolution and genetic diversity of B. anthracis globally. The data could be used to determine the relationship between B. anthracis strains from South Africa and other areas and to expand the canSNP typing scheme using melt-MAMA. The data might enable trace-back in and between anthrax cases/outbreaks, especially within the context of southern Africa.

Data description

We present the genomic data and analysis of whole genome sequences of B. anthracis strains isolated from animals anthrax outbreaks in Northern Cape Province. Sequence reads (in fastq format) and assembled genomes (in fasta format) were deposited at NCBI SRA and GenBank database under project accession No. PRJNA580142 and PRJNA510736 respectively. The information on the sample collection with accession numbers, SNP genotyping and genome assemblies is represented in Table 1, Table 2, Table 3 respectively. Isolates were also grouped using canonical SNPs (Table 4) typing scheme [2] used for phylogenetic branches (Fig. 1).
Table 1

Whole genome sequences of Bacillus anthracis strains collection with their accession numbers submitted to GenBank and Sequence Reads Achieve (SRA).

Strain nameHostCollection dateLocationAccession numberSequence coverage
2949_1DOvine10-May-2009South Africa: Northern Cape ProvinceRXZW00000000145
2991_1BOvine10-May-2009South Africa: Northern Cape ProvinceRXZV00000000199
3008_1BBovine10-May-2009South Africa: Northern Cape ProvinceRXZU00000000155
3122_2BOryx gazella10-May-2009South Africa: Northern Cape ProvinceRXZT00000000168
3132_1BTragelaphus strepsiceros10-May-2009South Africa: Northern Cape ProvinceRXZS00000000201
3275_2DSoil10-May-2009South Africa: Northern Cape ProvinceRXZR00000000267
3517_1CTragelaphus strepsiceros10-May-2009South Africa: Northern Cape ProvinceRXZQ00000000166
3517_2CTragelaphus strepsiceros10-May-2009South Africa: Northern Cape ProvinceRXZP00000000137
3631_4CTragelaphus strepsiceros10-May-2009South Africa: Northern Cape ProvinceRXZO00000000187
3631_3DTragelaphus strepsiceros10-May-2009South Africa: Northern Cape ProvinceRXZN00000000189
3631_8DTragelaphus strepsiceros10-May-2009South Africa: Northern Cape ProvinceRXZM00000000300
2110Ovis aries1998South Africa: Northern Cape ProvinceRXZL0000000038
JB10Equus burchellii quagga2009South Africa: Northern Cape ProvinceRXZK0000000060
JB25Tragelaphus strepsiceros2009South Africa: Northern Cape ProvinceSDEF0000000080
3618_2DTragelaphus strepsiceros10-May-2009South Africa: Northern Cape ProvinceRXZJ00000000178
6461_SP2Capra aegagrus2009South Africa: Northern Cape ProvinceSRP227303; SAMN13151840; SRR1035797820
6102_6BLoxodonta2009BotswanaSRP227303; SAMN13151841; SRR1035797921
3631_7CSoil2009South Africa: Northern Cape ProvinceSRP227303; SAMN13151842; SRR1035798124
5838Alcelaphus buselaphus1998South Africa: Northern Cape ProvinceSRP227303; SAMN13151843; SRR1035798017
2991_2BOvine2009South Africa: Northern Cape ProvinceSRP227303; SAMN13151844; SRR1035798519
3080_3BBovine2009South Africa: Northern Cape ProvinceSRP227303; SAMN13151845; SRR1035798317
3079_1COryx gazella2009South Africa: Northern Cape ProvinceSRP227303; SAMN13151846; SRR1035798425
3080_5ABovine2009South Africa: Northern Cape ProvinceSRP227303; SAMN13151847; SRR1035798226
3080_1BBovine2009South Africa: Northern Cape ProvinceSRP227303; SAMN13151848; SRR1035797712
3090_1BUnknown2009South Africa: Northern Cape ProvinceSRP228283; SAMN10614343; SRR1039062826
Table 2

Canonical SNPs used for genotyping of B. anthracis strains. SNP are in relation to B. anthracis Ames ancestor chromosome (NC_007530.2).

B. anthracis StrainsSNP-branchA.Br.006A.Br.007A.Br.008A.Br.005A.Br.004A.Br.003A.Br.002A.Br.001A.Br.009A.Br.011A.Br.014A.Br.013

Ancestral Template SNPCAAATTCTAGTA
Derived Template SNPAGCGCCTCGACG
Ames ancestorA.Br.001 (Ames)AAAGCCTCAGTA
SterneA.Br.002 (Sterne)AAAGCCTTAGTA
3080_5AA.Br.002 (Sterne)AAAGCCTTAGTA
3080_1BA.Br.002 (Sterne)AAAGCCTTAGTA
6102_6BA.Br.005/006 (Ancient A)AAAATTCTAGTA
6461_SP2A.Br.005/006 (Ancient A)AAAATTCTAGTA
2110A.Br.003/004 (A.Br.101)AAAGCCCTAGCA
5838A.Br.003/004 (A.Br.101)AAAGCCCTAGCA
3631_1CA.Br.003/004 (A.Br.101)AAAGCCCTAGCA
3080_3BA.Br.003/004 (A.Br.101)AAAGCCCTAGCA
3079_1CA.Br.003/004 (A.Br.101)AAAGCCCTAGCA
3090_1BA.Br.003/004 (A.Br.101)AAAGCCCTAGCA
JB10/NC14A.Br.003/004 (A.Br.101)AAAGCCCTAGCA
JB25/NC_29A.Br.003/004 (A.Br.101)AAAGCCCTAGCA
2991_2BA.Br.003/004 (A.Br.101)AAAGCCCTAGCA
3618_2DA.Br.003/004 (A.Br.101)AAAGCCCTAGCA
3517_1CA.Br.003/004 (A.Br.101)AAAGCCCTAGCA
3631_4CA.Br.003/004 (A.Br.101)AAAGCCCTAGCA
3631_7CA.Br.003/004 (A.Br.101)AAAGCCCTAGCA
3275_2DA.Br.003/004 (A.Br.101)AAAGCCCTAGCA
3122_2BA.Br.003/004 (A.Br.101)AAAGCCCTAGCA
3008_1BA.Br.003/004 (A.Br.101)AAAGCCCTAGCA
2949_1DA.Br.003/004 (A.Br.101)AAAGCCCTAGCA
2991_1BA.Br.003/004 (A.Br.101)AAAGCCCTAGCA
3517_2CA.Br.003/004 (A.Br.101)AAAGCCCTAGCA
3132_1BA.Br.003/004 (A.Br.101)AAAGCCCTAGCA
3631_3DA.Br.003/004 (A,Br.101)AAAGCCCTAGCA
3631_8DA.Br.003/004 (A.Br.101)AAAGCCCTAGCA
Aust94A.Br.003/004 (Aust94)AAAGCCCTAGCG
VollumA.Br.007 (Vollum)AGAGTTCTAGTA
Table 3

Genomic features of the de novo assemblies B. anthracis strains (n = 15) using CLC Genomic workbench.

Strain nameSequence coverageNumber of contigsN50Minimum contig size (bp)Maximum contig size (bp)GC contentGenome SizeTotal coding sequences (CDSs)Total number of RNAs
2949_1D14544128 406423125 07235.15 147 3195 76465
2991_1B19937838 630316185 19235.15 395 6125 73654
3008_1B15544234 402406226 18935.15 418 9875 76363
3122_2B16843134 419361175 23035.15 401 8475 74054
3132_1B20117074 712146335 42235.15 350 3305 61197
3275_2D26775114 73850989 99835.15 352 1805 46359
3517_1C166121203 47735434337535.15 416 2935 69268
3517_2C13711949 61335255 93235.15 265 6285 86937
3631_4C18738535 768418177 85235.15 402 0815 71868
3631_3D18951322 221415108 00735.14 654 3825 76652
3631_8D30088214 27940198 83535.15 252 9495 71768
2110388567 04651777 02035.03 843 4255 90674
JB106018566 49315350 65435.15 180 5385 86134
JB258013691 967519646 63035.15 422 6685 69588
3618_2D17872154 0412803489 42735.15 417 8735 67462
Table 4

Melt-MAMA primers targeting canonical SNPs of the existing Birdsell et al. (2012) primers used in this study for the phylogenetic branches.

Assay nameaReference genome positionDerived MAMA 5′-3′Ancestral MAMA5′-3′Common reverse 5′-3′Annealing Temperature (°C)
Existing primers by Birdsell et al., 2012
A.Br.001182 106cggggcggggcggggcgggcAGAAGGAGCAAGTAATGTTATAGGTTTAaGTGGAGCAAGTAATGTTATAGGTTTAcGCACCTAAAATCGATAAAGCGACTGC55
A.Br.002947 760cggggcggggcggggcgggcAGAAGGAGCAAGTAATGTTATAGGTTTAaGTGGAGCAAGTAATGTTATAGGTTTAcGCACCTAAAATCGATAAAGCGACTGC55
A.Br.0031 493 280cggggcggggcggggcgggcAATTTAGATTTTCGTGTCGAATTAtGCAATTTAGATTTTCGTGTCGAATTAgGTTGTATAAAAACCTCCTTTTTCTACCTCAA55
A.Br.0043 600 786cggggcggggcggggcgggcCGCCGTCATACTTTGGAAaGCCGCCGTCATACTTTGGAAcGTGAATTGGTGGAGCTATGGAAGGATTA60
A.Br.0053 842 864cggggcggggcggggcgggcGAAAGATATATAAAAATGTTTTTTTATTTCGTtTGGAAAGATATATAAAAATGTTTTTTTATTTCGTcTAGCTGCGTTTAGTTATGCAAATC55
A.Br.006162 509cggggcggggcggggcgggcAATATGTTGTTGATCATTCCATCGCtTATATGTTGTTGATCATTCCATCGCgTCTAGCGTTTTTAAGTTCATCATACCCATGC55
A.Br.007266439cggggcggggcggggcgggcACAAGGTGGTAGTATTCGAGCTGAtTGAATTACAAGGTGGTAGTATTCGAGCTGAcTACGAGACGATAAACTGAATAATACCATCCT62.5
A.Br.0083947375cggggcggggcggggcgggcGTTACAAATATACGTTTAACAAGCcGCAAAAGTTACAAATATACGTTTAACAAGCtGACTACGCTATACGTTTTAGATGGAGATAATTC55
A.Br.0092589947cggggcggggcggggcgggcCCACTGTTTTTGAACGGCTcTGGCCACTGTTTTTGAACGGCTaTATTTTAGGTATATTAACTGCGGATGATGC60
A.Br.0111455402cggggcggggcggggcgggcCATAAAAGAAATCGGTACAATAGAAtAGCATAAAAGAAATCGGTACAATAGAAcAATCGGATATGATACCGATACCTTCTTATC55
A.Br.0145078168ggggcggggcggggcggggcggggcAATGGTAAATTGTAATGTTGAGCTtCAATGGTAAATTGTAATGTTGAGCTgTTTTTTACTAAAAAATTACTTTTTTTGAAAA57
A.Br.0132465446ggggcggggcggggcggggcggggcTTGTAAAAATTCTATGTGAATCACATtGTTGTAAAAATTCTATGTGAATCACATcATTATCCACCTTCTTATAATTATTTATTACTAT57

GC-clamp (cggggcggggcggggcgggc).

Bacillus anthracis Ames ancestor reference genome (NC_007530.2).

Fig. 1

Maximum likelihood phylogeny of the major canonical single nucleotide polymorphism (canSNP) groups for the 26 B. anthracis strains as well as B. anthracis Ames ancestor, Vollum and Sterne control sequences. Most of the B. anthracis strains (n = 21) grouped in the canSNP A.Br 003/004 (Aust94) clade (red), while two strains, 3080_1B and 3080_5A, isolated from bovine grouped in A.Br.001/002 (Sterne) group (green) and isolates from Botswana (6102_6B) and Sendlingsdrift (6461_SP2) grouped in the A.Br.005/006 (Ancient a) group (purple).

Whole genome sequences of Bacillus anthracis strains collection with their accession numbers submitted to GenBank and Sequence Reads Achieve (SRA). Canonical SNPs used for genotyping of B. anthracis strains. SNP are in relation to B. anthracis Ames ancestor chromosome (NC_007530.2). Genomic features of the de novo assemblies B. anthracis strains (n = 15) using CLC Genomic workbench. Melt-MAMA primers targeting canonical SNPs of the existing Birdsell et al. (2012) primers used in this study for the phylogenetic branches. GC-clamp (cggggcggggcggggcgggc). Bacillus anthracis Ames ancestor reference genome (NC_007530.2). Maximum likelihood phylogeny of the major canonical single nucleotide polymorphism (canSNP) groups for the 26 B. anthracis strains as well as B. anthracis Ames ancestor, Vollum and Sterne control sequences. Most of the B. anthracis strains (n = 21) grouped in the canSNP A.Br 003/004 (Aust94) clade (red), while two strains, 3080_1B and 3080_5A, isolated from bovine grouped in A.Br.001/002 (Sterne) group (green) and isolates from Botswana (6102_6B) and Sendlingsdrift (6461_SP2) grouped in the A.Br.005/006 (Ancient a) group (purple).

Experimental design, materials, and methods

Diagnostic real-time PCR for chromosomal and plasmids markers of B. anthracis

The identification of B. anthracis isolates was performed as described by WHO [3]. The 20 μl PCR reaction consisted of 10 μl of FastStart Essential master mix (Roche Applied Science), 0.5 μM of each primer, 0.2 μM of probe for each chromosomal and plasmid target pairs with fluorescein on the one and LCRed640 on the other (Tib MolBiol GmbH, Germany) and 2.5 μl of template DNA. The PCR conditions on a LightCycler™ Nano (Roche Applied Science) were used as described in WHO [3]. The PCR conditions on a LightCycler™ Nano (Roche Applied Science) consisted of an initial cycle at 95 °C for 10 minutes, slope at 20 °C/second, followed by 40 cycles of 95 °C for 10 seconds; 57 °C for 20 seconds; 72 °C for 30 seconds, slope 20 °C/second with one single signal acquisition at the end of annealing cycle. Denaturation at 95 °C for 3 seconds with a slope 20 °C/second; 40 °C for 30 seconds, slope 20 °C/second; 80 °C for 3 seconds at a slope of 0.1 °C/second with continuous acquisition of the signal. Cooling to 40 °C for 30 seconds, slope 20 °C/second.

Genotyping of B. anthracis strains using Melt-MAMA assays

Melt-MAMA assays of the canSNP markers were used to amplify the DNA of the NCP B. anthracis strains. The panel included 12 canSNPs that were used for the grouping of the B. anthracis strains (n = 26) using existing Melt-MAMA primers (Table 4) derived and ancestral controls were created as described by Birdsell et al. [2]. The reaction included 2.5 μl DNA diluted in 1× FastStart DNA Green Master (Roche Applied Science) with an ancestral forward and a derived forward SNP target primer (GC-clamp: no-GC-clamp) and a common reverse primer (Inqaba Biotec™) (Table 2) with a starting concentration of 0.2 μM depending on the ratio indicated which allowed for separation of melt peaks by at least 5 °C. Thermocycling parameters on the LightCycler™ 96 (Roche Applied Science) were 95 °C for 10 minutes, followed by 35 cycles at 95 °C for 15 seconds and 55 °C-60 °C (oligonucleotide dependent for 1 minute) for 35 cycles. End-point PCR amplicons were subjected to melt analysis using a dissociation protocol comprising of 95 °C for 15 seconds, followed by incremental temperature ramping (0.1 °C) from 60 °C to 95 °C. SYBR Green fluorescence intensity was measured at 530 nm at each ramp interval and plotted against temperature and observed as the separate melt peaks for each SNP. Controls included in every run were DNA from B. anthracis Ames, Vollum and Sterne 34F2 strains. Phylogenetic relationships between 26 B. anthracis strains were determined in the MEGA version 7 [4] using the maximum likelihood method based on the Tamura three-parameter model. The tree was generated with a bootstrap replication value of 500.

High-throughput sequencing and bioinformatics analysis

The DNA samples that were extracted from B. anthracis were subjected to library preparation by using the Nextera XT DNA Sample Prep kit (Illumina-compatible, Epicentre Biotechnology). Different sequence reads of B. anthracis genomes were generated on HiSeq 2500 and MiSeq instruments platforms. Clusters were generated on the flow cell using HiSeq Paired-End Cluster Generation kit (Ilumina, USA) for the HiSeq 2500 platform. Sequencing of paired end libraries were performed on the Illumina MiSeq and HiSeq 2500 sequencer using the 200-cycle SBS (sequencing by synthesis) sequencing v3 kit (Illumina, USA) and HiSeq Sequencing Kit (200 cycles) (Illumina, USA) respectively. Quality of the genome sequenced reads were assessed using FastQC software 0:10.1 [5]. Trimommatic version 0.33 [6] was used to remove the sequenced adapter, and ambiguous nucleotide reads. De novo assemblies of the paired end reads were performed using CLC Genomics Workbench version 11.1 (CLC, Denmark). The assembled contigs were ordered by Mauve tool version 2.3.1 [7] using B. anthracis Ames ancestor (GenBank accession numbers NC_007530.2, NC_007322.2 and NC_007323.3) in order to assess the accuracy and efficiency of the contigs. All trimmed sequence reads were also mapped to the reference using Burrows-Wheeler Aligner (BWA) version 0.7.12 [8] to determine B. anthracis replicons i.e. chromosome and the two plasmids. Assembled genomes were annotated using the NCBI Prokaryotic Genome Annotation pipeline. Sequenced reads were deposited to NCBI under Sequence Reads Archive (SRA), and assembled genomes to GenBank.

Specifications Table

SubjectMicrobial genomics
Specific subject areaComparative microbial genomics of B. anthracis strains for evolution and genetic diversity using single nucleotide polymorphisms (SNPs)
Type of dataSequence files, Table, figure
How data were acquiredDNA extraction was performed on pure cultures using DNA Mini kit (Qiagen) purification kit. High-throughput DNA sequencing using Illumina HiSeq 2500 and MiSeq Sequencing system. De novo assemblies was performed using CLC-Genomic workbench version 11.1. Assembled genomes were annotated using NCBI Prokaryotic Genome Annotation Pipeline version 4.7. SNP genotyping Can SNP typing scheme was performed on LightCycler™ 96 (Roche Applied Science). MEGA version 7 was used to generate phylogenetic tree.
Data formatRaw and analysed data of whole genome sequences (Fastq and fasta)
Parameters for data collectionSamples were collected from animals that died of anthrax. Isolated pure cultures from sheep blood agar were used for DNA extractions, genotyping and sequencing.
Description of data collectionPure culture isolates were identified using classical bacteriological methods including penicillin and bacteriophage sensitivity. DNA samples of these isolates were verified using B. anthracis plasmid and chromosomal gene targets using real- time PCR. Trimmed sequence reads were used for de novo assembly.
Data source locationUniversity of Pretoria, Department of Veterinary and Tropical Diseases, Pretoria, South Africa
Data accessibilityThe sequenced data were deposited to Sequence Read Archive (SRA) and GenBank in National Center for Biotechnology Information (NCBI). Accession numbers are included in this manuscript in a table format.BioProject numbers:PRJNA510736 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA510736)PRJNA580142 (https://www.ncbi.nlm.nih.gov/sra/PRJNA580142)With the article
Related research articleLekota KE, Bezuidt OKI, Mafofo J, Rees J, Muchayeyi FC, Madoroba E, van Heerden H. Whole genome sequencing and identification of Bacillus endophyticus and B. anthracis isolated from anthrax outbreaks in South Africa. BMC Microbiology (2018) 18:67. doi: 10.1186/s12866-018-1205-9 [1]
  6 in total

1.  Mauve: multiple alignment of conserved genomic sequence with rearrangements.

Authors:  Aaron C E Darling; Bob Mau; Frederick R Blattner; Nicole T Perna
Journal:  Genome Res       Date:  2004-07       Impact factor: 9.043

2.  MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets.

Authors:  Sudhir Kumar; Glen Stecher; Koichiro Tamura
Journal:  Mol Biol Evol       Date:  2016-03-22       Impact factor: 16.240

3.  Melt analysis of mismatch amplification mutation assays (Melt-MAMA): a functional study of a cost-effective SNP genotyping assay in bacterial models.

Authors:  Dawn N Birdsell; Talima Pearson; Erin P Price; Heidie M Hornstra; Roxanne D Nera; Nathan Stone; Jeffrey Gruendike; Emily L Kaufman; Amanda H Pettus; Audriana N Hurbon; Jordan L Buchhagen; N Jane Harms; Gvantsa Chanturia; Miklos Gyuranecz; David M Wagner; Paul S Keim
Journal:  PLoS One       Date:  2012-03-16       Impact factor: 3.240

4.  Whole genome sequencing and identification of Bacillus endophyticus and B. anthracis isolated from anthrax outbreaks in South Africa.

Authors:  Kgaugelo Edward Lekota; Oliver Keoagile Ignatius Bezuidt; Joseph Mafofo; Jasper Rees; Farai Catherine Muchadeyi; Evelyn Madoroba; Henriette van Heerden
Journal:  BMC Microbiol       Date:  2018-07-09       Impact factor: 3.605

5.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

6.  Trimmomatic: a flexible trimmer for Illumina sequence data.

Authors:  Anthony M Bolger; Marc Lohse; Bjoern Usadel
Journal:  Bioinformatics       Date:  2014-04-01       Impact factor: 6.937

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.