| Literature DB >> 32213253 |
Charlotte Couchoud1,2, Xavier Bertrand2,1, Benoit Valot3,2, Didier Hocquet2,1,3.
Abstract
Next-generation sequencing (NGS) is now widely used in microbiology to explore genome evolution and the structure of pathogen outbreaks. Bioinformatics pipelines readily detect single-nucleotide polymorphisms or short indels. However, bacterial genomes also evolve through the action of small transposable elements called insertion sequences (ISs), which are difficult to detect due to their short length and multiple repetitions throughout the genome. We designed panISa software for the ab initio detection of IS insertions in the genomes of prokaryotes. PanISa has been released as open source software (GPL3) available from https://github.com/bvalot/panISa. In this study, we assessed the utility of this software for evolutionary studies, by reanalysing five published datasets for outbreaks of human major pathogens in which ISs had not been specifically investigated. We reanalysed the raw data from each study, by aligning the reads against reference genomes and running panISa on the alignments. Each hit was automatically curated and IS-related events were validated on the basis of nucleotide sequence similarity, by comparison with the ISFinder database. In Acinetobacter baumannii, the panISa pipeline identified ISAba1 or ISAba125 upstream from the ampC gene, which encodes a cephalosporinase in all third-generation cephalosporin-resistant isolates. In the genomes of Vibrio cholerae isolates, we found that early Haitian isolates had the same ISs as Nepalese isolates, confirming the inferred history of the contamination of this island. In Enterococcus faecalis, panISa identified regions of high plasticity, including a pathogenicity island enriched in IS-related events. The overall distribution of ISs deduced with panISa was consistent with SNP-based phylogenic trees, for all species considered. The role of ISs in pathogen evolution has probably been underestimated due to difficulties detecting these transposable elements. We show here that panISa is a useful addition to the bioinformatics toolbox for analyses of the evolution of bacterial genomes. PanISa will facilitate explorations of the functional impact of ISs and improve our understanding of prokaryote evolution.Entities:
Keywords: bacterial evolution; insertion sequence; outbreak; whole-genome sequencing
Mesh:
Substances:
Year: 2020 PMID: 32213253 PMCID: PMC7371109 DOI: 10.1099/mgen.0.000356
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
Collections of genomes of bacterial pathogens reanalysed with panISa software
|
Reference |
Reference genome (NCBI accession numbers) |
Isolates | |
|---|---|---|---|
|
Total no. |
Isolates with available data | ||
|
Eppinger |
|
116 |
110 |
|
Martinez-Urtaza |
|
48 |
16 |
|
Raven |
|
168 |
168 |
|
Wilson |
|
69 |
68 |
|
Holt |
|
44 |
35 |
Result of the reanalysis of five genome collections with panISa.
|
Reference |
Species |
|
IS-related events ( |
ISs ( |
Proportion (%) of isolates with ≥1 IS |
|---|---|---|---|---|---|
|
Eppinger |
|
2913 |
207 |
5 |
91 |
|
Martinez-Urtaza |
|
692 |
0 |
0 |
0 |
|
Raven |
|
15 878 |
1348 |
29 |
100 |
|
Wilson |
|
727 |
4 |
1 |
1.4 |
|
Holt |
|
1371 |
345 |
19 |
86 |
The third column shows the number of insertion events identified by panISa from WGS datasets of bacterial pathogens, the fourth column gives the number of hits matching sequences in the ISFinder database, considered to correspond to IS-related events, the fifth column gives the number of different ISs found among the IS-related events, and the last column gives the proportion of isolates concerned.
Fig. 1.Comparison of (a) phylogenetic analysis from Eppinger et al. [9] and (b) IS-related events identified by panISa. Red boxes represent insertions of IS1634-like elements and green boxes represent insertions of IS256-like elements. Each column represents an insertion site on a specific chromosome. The two colours refer to the two different ISs detected in the dataset, and the different shades of colours represent the different positions of insertion of each IS [at the chromosomal positions indicated at the bottom of (b)].
Fig. 2.Phylogenic analysis of GC1. (a) Temporal and phylogenetic analysis performed by Holt et al. [13]. The dot colours represent the capsule type of each isolate. (b) Hierarchical clustering based on the presence/absence of 345 IS-related events implicating 19 ISs in 35 isolates of with available SRA data. Each of the 345 columns represents one IS-related event, with grey indicating no event, black indicating an event and white if the sequence data could not be obtained. (c) Representation of the insertion of ISAba1 and ISAba125 upstream from ampC.
Fig. 3.Distribution of the IS-related events in the genomes of a collection of 168 clinical isolates of [11]. (a) Distribution of the number of IS-related events as a function of the number of isolates affected. The y-axis is drawn to a logarithmic scale. Red bars represent IS-related events that occurred >100 bp from a coding sequence or occurred in less than ten isolates. Black bars represent IS-related events that occurred within 100 bp of a coding sequence and are reported in Table 3. (b) The outer green circle represents the number of IS-related events, the red central circle indicates the number of genomes from the collection of 168 isolates of containing the region, and the inner grey circle represents GC-content relative to the mean value. The 150 kb pathogenicity island and the ICE (ICEEEfaV583-1) of V583 are indicated by the green and red sectors, respectively.
IS-related events common to at least ten of the 168 isolates from the Raven et al. dataset [11]
|
Isolates ( |
IS |
Position |
Protein potentially affected by the IS insertion | ||
|---|---|---|---|---|---|
|
|
|
|
| ||
|
10 |
IS |
499 863 |
In |
Hypothetical protein |
gene540 |
|
10 |
IS |
2 594 127 |
Upstream |
HAD superfamily hydrolase |
gene2620 |
|
Downstream |
Hypothetical protein |
gene2621 | |||
|
10 |
IS |
1 627 490 |
In |
ABC transporter ATP-binding protein |
gene1652 |
|
10 |
IS |
705 382 |
Downstream |
cell wall surface anchor family protein |
gene733 |
|
11 |
IS |
491 272 |
Upstream |
Hypothetical protein |
gene530 |
|
11 |
IS |
991 057 |
In |
Phosphorylase |
gene1006 |
|
11 |
IS |
2 443 962 |
Downstream |
Conjugal transfer protein |
gene2462 |
|
12 |
IS |
1 337 504 |
Upstream |
Hypothetical protein |
gene1348 |
|
Downstream |
hydroxymethylglutaryl-CoA synthase |
gene1349 | |||
|
13 |
IS |
1 224 046 |
Downstream |
ABC transporter ATP-binding protein |
gene1240 |
|
14 |
IS |
218 868 |
Upstream |
Hypothetical protein |
gene222 |
|
14 |
IS |
832 847 |
Upstream |
Potassium uptake protein |
gene850 |
|
19 |
IS |
2 650 256 |
Upstream |
Lipoate-protein ligase A |
gene2676 |
|
21 |
IS |
2 594 120 |
Downstream |
HAD superfamily hydrolase |
gene2620 |
|
Upstream |
Hypothetical protein |
gene2621 | |||
|
21 |
IS |
2 809 601 |
Downstream |
Hypothetical protein |
gene2863 |
|
Upstream |
Valyl-tRNA synthetase |
gene2864 | |||
|
29 |
IS |
352 544 |
In |
DadA family oxidoreductase |
gene418 |
|
31 |
IS |
2 404 627 |
Upstream |
Hypothetical protein |
gene2430 |
|
32 |
IS |
1 317 331 |
In |
Hypothetical protein |
gene1332 |
|
34 |
IS |
2 594 130 |
Downstream |
HAD superfamily hydrolase |
gene2620 |
|
Upstream |
Hypothetical protein |
gene2621 | |||
|
37 |
IS |
1 954 670 |
In |
Hypothetical protein |
gene1987 |
|
39 |
IS |
1 805 579 |
In |
3-methyl-2-oxobutanoate hydroxymethyltransferase |
gene1826 |
|
43 |
IS |
608 941 |
In |
DeoR family transcriptional regulator |
gene644 |
|
43 |
IS |
641 514 |
Upstream |
Rotamase |
gene672 |
The second and third columns give the name of the IS and the position of its insertion in the reference genome E. faecalis V583. The fourth column gives the position of the IS insertion in relation to the gene potentially affected, and the fifth column gives the function of the protein and the gene ID of the gene potentially affected by the IS. More detailed information are given in Data S5.