Literature DB >> 29236926

Genome of Leptospira borgpetersenii strain 4E, a highly virulent isolate obtained from Mus musculus in southern Brazil.

Marcus Redü Eslabão1, Frederico Schmitt Kremer1, Rommel Thiago Juca Ramos2, Artur Luiz da Costa da Silva2, Vasco Ariston de Carvalho Azevedo3, Luciano da Silva Pinto1, Éverton Fagonde da Silva4, Odir Antônio Dellagostin1.   

Abstract

A previous study by our group reported the isolation and characterisation of Leptospira borgpetersenii serogroup Ballum strain 4E. This strain is of particular interest because it is highly virulent in the hamster model. In this study, we performed whole-genome shotgun genome sequencing of the strain using the SOLiD sequencing platform. By assembling and analysing the new genome, we were able to identify novel features that have been previously overlooked in genome annotations of other strains belonging to the same species.

Entities:  

Mesh:

Year:  2018        PMID: 29236926      PMCID: PMC5722270          DOI: 10.1590/0074-02760170111

Source DB:  PubMed          Journal:  Mem Inst Oswaldo Cruz        ISSN: 0074-0276            Impact factor:   2.743


The Leptospira genus consists of 23 species of bacteria (Boonsilp et al. 2013, Bourhy et al. 2014), of which at least nine are naturally pathogenic, five are opportunists (“intermediary pathogenic”), and the remaining are saprophytes (non-pathogenic). L. interrogans is the most commonly reported cause of leptospirosis, which is an infection caused by pathogenic leptospiras; however, other species, such as L. borgpetersenii, L. kierschneri, and L. santarosai, are also associated with leptospirosis and are responsible for many infections and deaths both in humans and animals (Guerra 2009). Leptospirosis is a worldwide distributed zoonotic disease that has reemerged as a public health problem in many countries in recent years, especially in countries located in the tropics (Guerra 2013). L. borgpetersenii serovar Ballum strain 4E was isolated from the suburban area of Pelotas, a city located in southern Brazil, from mice (Mus musculus) (da Silva et al. 2010). Previous studies have demonstrated that this strain has a LD50 (lethal dose for 50% of the population) of ~5.18 leptospires in a hamster model. As such, it is more lethal and virulent than are other standard model strains such as L. interrogans serovar Copenhageni strain Fiocruz L1-130 (LD50 = ~80 leptospires) (Diniz et al. 2011). The characterisation of highly virulent strains may provide useful data that can potentially extend our knowledge and understanding of the pathogenesis of these bacteria and lead to the development of new vaccines. Further, it may generate insights that are useful for epidemiological surveillance. In the present study, we performed a whole-genome shotgun analysis of the L. borgpetersenii serovar Ballum strain 4E to develop a more comprehensive characterisation of this isolate. Bacterial culture and DNA extraction were performed in accordance with previously described methods (Kremer et al. 2016b). Whole-genome shotgun sequencing was performed using the ABI SOLiD v. 4 sequencing platform with a 50 base-pair (bp) single-end library. Raw reads in colour-space FASTA format (csFASTA) were pre-processed using SAET (https://www.thermofisher.com/) and converted into FASTQ format using our in-house Python script cs2q (http://labbioinfo.ufpel.edu.br/cs2q). Two assembly approaches were evaluated for the L. borgpetersenii strain 4E genome: de novo assembly and reference-guided assembly. De novo assembly was performed using Velvet, with different parameters of k-mer length, expected coverage and coverage cutoff, and the assembly metrics were accessed using QUAST (Gurevich et al. 2013). Reference-guided assembly was performed by mapping the reads to the genome of L. borgpetersenii serovar Ballum strain 56604 (GenBank: CP012029.1, CP012030.1) using SMALT (www.sanger.ac.uk/science/tools/smalt-0 ). The resulting SAM file was then converted to BAM format and sorted using Samtools before a consensus sequence was extracted using Samtools, BCFtools, VCFutils.pl (Li et al. 2009) and GATk (McKenna et al. 2010). Genome annotation was performed using Genix (Kremer et al. 2016a) and manually reviewed and curated using Artemis (Rutherford et al. 2000). A variant calling analysis using Samtools, BCFtools, and VCFutils.pl that was based on the BAM file generated from the aligned reads was performed to identify single nucleotide polymorphisms (SNPs) and insertions and deletions (INDELs). The effect of each variant was inferred based on the annotation of L. borgpetersenii serovar Ballum strain 56604 using Snpeff (Reumers 2004). The reference-guided assembly covered > 99.99% of the reference sequence, with a mean coverage of ~ 400x. A lack of coverage was identified in five assembly gaps, which were associated with mobile elements, such as transposons, that can change their positions in the genome and usually result in gaps in reference-guided assemblies or collapses in a single contig in de novo assembly from short reads, even when they are present in multiple copies. The de novo assemblies generated by Velvet were highly fragmented, with more than 5,000 contigs and a very low N50 (53), thus making it inappropriate for any downstream analysis. An overview of the features identified in the genome of L. borgpetersenii serovar Ballum strain 4E is shown in Table I. We identified a total of 3469 coding DNA sequences (CDSs), 37 transfer-RNAs (tRNAs), 4 ribosomal RNAs (rRNAs), one transfer-messenger RNA (tmRNA) and five riboswitch loci. Although the proteincoding genes found were almost the same as those identified in the genome of the 566604 strain, by using our annotation pipeline, we were able to identify new non-coding features that were overlooked in the reference annotation: a tmRNA gene and riboswitches. TmRNAs act as tRNAs and contain a small open reading frame (ORF) in their structure that encodes a peptide responsible for many regulation processes, including targeting proteins for degradation (Hayes & Keiler 2010). Riboswitches are non-coding motifs that are present in the untranslated regions (UTRs) of some messenger RNAs (mRNAs) that act as cis-regulatory elements and bind specific metabolites to inhibit the gene expression. Riboswitches are typically found in genes associated with vitamin metabolism, e.g., cobalamin (Garst et al. 2011, Serganov & Nudler 2013). Previous studies have demonstrated that riboswitch-regulated cobalamin (B12) autotrophy is a virulence factor in the Leptospira genus (Fouts et al. 2016). Therefore, a deeper annotation of the non-coding features may provide a better description of the resulting transcriptome.
TABLE I

Features identified in the draft genome of the Leptospira borgpetersenii serovar Ballum strain 4E during the annotation

The genes that presented missense mutations in the variant calling analysis are displayed in Table II, and their locations in the genome of L. borgpetersenii strain 4E are illustrated in Figure. A total 41 genes were predicted as being affected by missense mutations in the variant calling analysis, although 33 of them had only one mutation. One of the genes, LB4E_3373, which encodes a protein from the PF07598 family, presented 27 missense SNPs compared with the genome of the strain 56604. The orthologous genes from the PF07598 family have already been associated with adaptation to the host in L. interrogans and regulation of gene expression during the life cycle and infection (Lehmann et al. 2013).
TABLE II

Genes containing missense mutations identified in the genome of Leptospira borgpetersenii strain 4E based on the variant calling analysis using the genome of L. borgpetersenii strain 56604 as reference

Locus tagSNPsProduct
Strain 56604Strain 4E
LBBP_04290LB4E_337327PF07598 family proteina
LBBP_02267LB4E_180110Hypothetical protein
LBBP_04295LB4E_33785Integrase core domain protein
LBBP_02266LB4E_18005M23 family peptidasea
LBBP_03954-3Hypothetical protein
LBBP_02437LB4E_19283Hypothetical protein
LBBP_01389LB4E_11173PPE proteina
LBBP_04424-2Hypothetical protein
LBBP_04423LB4E_34881Transcriptional regulator, Fis family
LBBP_04394LB4E_34641Putative EF-P lysine aminoacylase GenX
LBBP_04178LB4E_32801Transposase
LBBP_04103LB4E_32221AraC family transcriptional regulator
LBBP_03775-1Hypothetical protein
LBBP_03455LB4E_27091PF07600 family protein
LBBP_03226LB4E_25301Flagellin domain proteina
LBBP_02875LB4E_22691Hypothetical protein
LBBP_02823LB4E_22271Transposase
LBBP_02742LB4E_21631Dolichyl-phosphate-mannose-protein mannosyltransferase
LBBP_02514LB4E_19911Stage II sporulation protein E
LBBP_02460LB4E_19471Hypothetical protein
LBBP_02259LB4E_17921DNA-directed RNA polymerase subunit beta
LBBP_01965LB4E_15761Ribosomal RNA small subunit methyltransferase H
LBBP_01593LB4E_128811-aminocyclopropane-1-carboxylate deaminase
LBBP_01564LB4E_12671Tyrosine recombinase XerD
LBBP_01436LB4E_11541Oma87-like Outer membrane protein
LBBP_01392LB4E_11201Hypothetical protein
LBBP_01368LB4E_10981Hypothetical protein
LBBP_01318-1Hypothetical protein
LBBP_01157LB4E_11571DNA repair protein RecN
LBBP_01063LB4E_08481tRNA nucleotidyltransferase/poly(A) polymerase family protein
LBBP_00977LB4E_07761Uncharacterized protein
LBBP_00916LB4E_07161Flagellar motor switch protein FliN
LBBP_00894LB4E_07021Transketolase, pyridine binding domain protein
LBBP_00821LB4E_06431Putative coproporphyrinogen dehydrogenase
LBBP_00739LB4E_05801Transposase
LBBP_00738-1Hypothetical protein
LBBP_00468-1Hypothetical protein
LBBP_00433LB4E_03561Hypothetical protein
LBBP_00376LB4E_03091Histidine kinase of a two-component regulator system
LBBP_00318LB4E_02661RND transporter, Hydrophobe/Amphiphile Efflux-1 (HAE1)/Heavy Metal Efflux (HME) family, permease protein
LBBP_00116LB4E_01021NUDIX hydrolase

potentially related to pathogenesis.

Map of the two chromosomes of Leptospira borgpetersenii strain 4E. Genes identified as mutated (non-synonymous mutations) based on comparison with L. borgptersenii strain 56604 are indicated in blue, and non-mutated genes are indicated in red.

potentially related to pathogenesis. Another highly mutated gene, LB4E_1801, contains 10 single-nucleotide polymorphisms, but its function remains unclear, and no BLAST hit in Uniprot (Apweiler et al. 2004) could allow a deeper annotation or provide any clue regarding its molecular function. We also identified five mutations in a gene that encodes an M23 peptidase (LB4E_1800), which has already been associated with fibronectin binding in Leptospira and other closely related genera, such as Treponema, and may contribute to the pathogenesis process. Although de novo assembly is usually preferred for microbial organisms, it is associated with many drawbacks in obtaining a finished genome (Miller et al. 2010). Therefore, reference-guided assembly, based on an already-finished genome, may be a more reasonable approach to assembly when a closely related reference is available. In our case, both the 4E and 56604 strains belonged to the same species and serovar, so there was no requirement for a de novo assembly in this case. In fact, the SOLiD sequencing platform offers a high-throughput platform, short read length (50 bp) and high accuracy (Liu et al. 2012); as such, it is more suitable for re-sequencing/reference-guided assembly than de novo assembly. The SOLiD sequencing process requires two hybridisation reactions to identify each base, so the probability of an erroneous identification or an artificial insertion / deletion tends to be much smaller compared with other platforms, such as Illumina and IonTorrent. In fact, in cases of sequencing artefacts, the decoding process of the colour-space data (csFASTA) to nucleotide-space format (FASTA) (based on nucleotide transitions) would generate an apparently random sequence after the erroneous base position, which probability would not align to the reference genome in the read mapping process (during a variant calling study) or be used in the assembly of a contig (in a de novo assembly). The reliability of this platform has already been demonstrated by previous studies, such as the benchmarking study performed by Ratan et al. (2013), which compared the accuracy of three different NGS platforms (ABI SOLiD, Illumina HiSeq and Roche 454 FLX) in the identification of SNPs in a human sample. In this case, the number of SNPs identified by SOLiD that were validated by mass-spectrometry was higher that what was observed in the other platforms. Therefore, although SOLiD is not a first option for microbial genomics, for which benchtop platforms are usually preferred, it may still be a valuable tool when aiming for a more accurate identification of mutations. Finally, a de novo assembly using SOLiD data resulted in a more fragmented draft genome than other sequencing technologies because the short read length implies that there are many difficulties for the assembly algorithms due to the occurrence of repeated regions along the genome that may be collapsed by the de Bruijn graphs (Alkan et al. 2010); as such, this method would not be appropriate in this case. In the context of Leptospira research, genomic data from highly virulent strains might provide useful information for the development of new vaccines and diagnostic methods and improve the understanding of bacterial pathogenesis and pathogen-host interactions. The presence of a high number of mutations in a gene that encodes a protein from the PF07598 family, which has already been suggested to be related to its pathogenesis in previous studies, may be one of the reasons for the greater virulence observed in this strain, although further studies are necessary to validate this relationship. Additionally, the availability of genomic characterisation from this strain might be useful for future epidemiological surveillance studies in southern Brazil. Nucleotide sequence accession number - The complete genome of L. borgpetersenii strain 4E is available at GenBank under the accession codes CP015814.2 (chromosome I) and CP015815.2 (chromosome II). The raw reads from this sequencing project in are available at the NCBI Short Read Archive under accession code SRR5266483.
  23 in total

1.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors:  Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal:  Genome Res       Date:  2010-07-19       Impact factor: 9.043

Review 2.  Leptospirosis.

Authors:  Marta A Guerra
Journal:  J Am Vet Med Assoc       Date:  2009-02-15       Impact factor: 1.936

3.  Highly virulent Leptospira borgpetersenii strain characterized in the hamster model.

Authors:  Juliana Alcoforado Diniz; Samuel Rodrigues Félix; Josiane Bonel-Raposo; Amilton Clair Pinto Seixas Neto; Flávia Aleixo Vasconcellos; André Alex Grassmann; Odir Antônio Dellagostin; José Antonio Guimarães Aleixo; Everton Fagonde da Silva
Journal:  Am J Trop Med Hyg       Date:  2011-08       Impact factor: 2.345

4.  Preliminary characterization of Mus musculus-derived pathogenic strains of Leptospira borgpetersenii serogroup Ballum in a hamster model.

Authors:  Everton F da Silva; Samuel R Félix; Gustavo M Cerqueira; Michel Q Fagundes; Amilton C P S Neto; André A Grassmann; Marta G Amaral; Tiago Gallina; Odir A Dellagostin
Journal:  Am J Trop Med Hyg       Date:  2010-08       Impact factor: 2.345

Review 5.  Leptospirosis: public health perspectives.

Authors:  Marta A Guerra
Journal:  Biologicals       Date:  2013-07-10       Impact factor: 1.856

Review 6.  Riboswitches: structures and mechanisms.

Authors:  Andrew D Garst; Andrea L Edwards; Robert T Batey
Journal:  Cold Spring Harb Perspect Biol       Date:  2011-06-01       Impact factor: 10.005

7.  The Sequence Alignment/Map format and SAMtools.

Authors:  Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

Review 8.  A decade of riboswitches.

Authors:  Alexander Serganov; Evgeny Nudler
Journal:  Cell       Date:  2013-01-17       Impact factor: 41.582

9.  Limitations of next-generation genome sequence assembly.

Authors:  Can Alkan; Saba Sajjadian; Evan E Eichler
Journal:  Nat Methods       Date:  2010-11-21       Impact factor: 28.547

10.  SNPeffect: a database mapping molecular phenotypic effects of human non-synonymous coding SNPs.

Authors:  Joke Reumers; Joost Schymkowitz; Jesper Ferkinghoff-Borg; Francois Stricher; Luis Serrano; Frederic Rousseau
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.