Literature DB >> 32271863

gplas: a comprehensive tool for plasmid analysis using short-read graphs.

Sergio Arredondo-Alonso1, Martin Bootsma2,3, Yaïr Hein3, Malbert R C Rogers1, Jukka Corander4,5,6, Rob J L Willems1, Anita C Schürch1.   

Abstract

SUMMARY: Plasmids can horizontally transmit genetic traits, enabling rapid bacterial adaptation to new environments and hosts. Short-read whole-genome sequencing data are often applied to large-scale bacterial comparative genomics projects but the reconstruction of plasmids from these data is facing severe limitations, such as the inability to distinguish plasmids from each other in a bacterial genome. We developed gplas, a new approach to reliably separate plasmid contigs into discrete components using sequence composition, coverage, assembly graph information and network partitioning based on a pruned network of plasmid unitigs. Gplas facilitates the analysis of large numbers of bacterial isolates and allows a detailed analysis of plasmid epidemiology based solely on short-read sequence data.
AVAILABILITY AND IMPLEMENTATION: Gplas is written in R, Bash and uses a Snakemake pipeline as a workflow management system. Gplas is available under the GNU General Public License v3.0 at https://gitlab.com/sirarredondo/gplas.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2020. Published by Oxford University Press.

Entities:  

Year:  2020        PMID: 32271863      PMCID: PMC7320608          DOI: 10.1093/bioinformatics/btaa233

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

A single bacterial cell can harbor several distinct plasmids; however, current plasmid prediction tools from short-read WGS often have a binary outcome (plasmid or chromosome). To bin predicted plasmids into discrete entities, we built a new method based on the following concepts: (i) contigs of the same plasmid have a uniform sequence coverage (Antipov ; Rozov ), (ii) plasmid paths in the assembly graph can be searched for using a greedy approach (Müller and Chauve, 2019) and (iii) removal of repeat units from the plasmid graphs disconnects the graph into independent components (Vielva ). Here, we refined these ideas and introduce the concept of unitigs co-occurrence to create a pruned plasmidome network. Using an unsupervised approach, the network is queried to find highly connected nodes corresponding to sequences belonging to the same discrete plasmid unit, representing a single plasmid. We show that our approach outperforms other de novo and reference-based tools and fully automates the reconstruction of plasmids from short reads.

2 Materials and methods

2.1 Gplas algorithm

Given a short-read assembly graph (gfa format), segments (nodes) and edges (links) are extracted from the graph. Gplas uses mlplasmids (version 1.0.0, prediction threshold = 0.5) or plasflow (version 1.1, prediction threshold = 0.7) to classify segments as plasmid- or chromosome-derived and selects segments with an in- and out-degree of 1 (unitigs) (Arredondo-Alonso ; Krawczyk ). The k-mer coverage SD of the chromosome-derived unitigs is computed to quantify the fluctuation in the coverage of segments belonging to the same replicon unit. Plasmid-derived unitigs are considered to search for plasmid walks with a similar coverage and composition using a greedy approach (Supplementary Methods S1). Gplas creates a plasmidome network (undirected graph) in which nodes correspond to plasmid unitigs and edges are created and weighted based on the co-existence of the nodes in the solution space of the computed walks. Modularity values computed using a selection of partitioning algorithms (Blondel ; Newman, 2006; Pons and Latapy, 2005) are considered to perform a voting decision regarding the split of the components into different bins (subcomponents) in the undirected network (Supplementary Methods S1). These bins represent the set of plasmids present in the bacterial isolate and are plotted in the plasmidome network using igraph R package (Csardi ). The pseudocode and formalization of the algorithm are available in Algorithm 1 and Supplementary Methods S1, respectively. Gplas pseudocode

2.2 Benchmarking dataset

Gplas was benchmarked against current existing tools to bin plasmid contigs from short-read WGS: (i) plasmidSPAdes (de novo-based approach, version 3.12) (Antipov ), (ii) mob-recon (reference-based approach, version 1.4.9.1) (Robertson and Nash, 2018) and (iii) hyasp (hybrid approach, version 1.0.0) (Müller and Chauve, 2019). To evaluate the binning tools, we selected a set of 28 genomes with short- and long-read WGS available including 106 plasmids from 9 different bacterial species, which were not present in the databases or training sets of the tools (Supplementary Methods S3 and Table S1) (Arredondo-Alonso ; De Maio ; Decano ; Wick ). Let nbin be the total number of nodes present in the predicted bin and define ref as the reference replicon sequence with a highest number of nodes in each bin. Let nref be the total number of nodes comprised in ref. We then define two metrics commonly used in metagenomics for binning evaluation: (i) precision and (ii) completeness (Supplementary Methods S4).

3 Results

Gplas in combination with mlplasmids obtained the highest average precision (0.88) indicating that the predicted components were mostly formed by nodes belonging to the same discrete plasmid unit (Table 1 and Supplementary Fig. S1). The reported average completeness value (0.79) showed that most of the nodes from a single plasmid were recovered as a discrete plasmid bin by gplas (Table 1 and Supplementary Fig. S2). We observed a decline in the performance of gplas in combination with mlplasmids (precision = 0.82, completeness = 0.72) when considering uniquely bins with a size larger than one which indicated merging problems of large plasmids with a similar k-mer coverage (Supplementary Fig. S3 and Results S2). However, in all cases, the performance of gplas in combination with mlplasmids performed better than other de-novo and reference-based tools tested here (Table 1). To show the potential of gplas in combination with mlplasmids, we showcase the performance of our approach in two distinct bacterial isolates (Supplementary Results S1 and S2).
Table 1.

Gplas benchmarking

ToolPrecisionCompletenessBin size
gplas–mlplasmids0.88/0.82a0.79/0.72a6.02/10.9a
gplas–plasflow0.62/0.45a0.52/0.32a7.17/11.1a
hyasp0.64/0.56a0.36/0.30a3.84/5.65a
mob-recon0.79/0.71a0.56/0.51a3.4/7.22a
plasmidSPAdes0.52/0.27a0.56/0.38a6.99/13.7a

Components >1 node.

Gplas benchmarking Components >1 node. Mlplasmids only contains a limited range of species models (Supplementary Methods). For other bacterial species, we observed that plasflow probabilities in combination with gplas performed similar than the other de-novo approaches but also introduced bias when wrongly predicting chromosome contigs as plasmid nodes (Table 1 and Supplementary Fig. S1), thereby creating bins corresponding to chromosome and plasmid chimeras (precision = 0.62).

4 Discussion

We present a new tool called gplas, which enables the binning and a detailed analysis workflow of binary classified plasmid contigs into discrete plasmid units by relying on the structure of the assembly graph, k-mer information and partitioning of a pruned plasmidome network. A limitation of the presented approach is the generation of chimeras resulting from plasmids with similar k-mer profiles, k-mer coverage and sharing repeat unit(s), such as a transposase or an IS element. These cases cannot be unambiguously solved. Here, we integrated and extended upon features to predict plasmid sequences and exploit the information present in short-read graphs to automate the reconstruction of plasmids. Click here for additional data file.
  12 in total

1.  Finding community structure in networks using the eigenvectors of matrices.

Authors:  M E J Newman
Journal:  Phys Rev E Stat Nonlin Soft Matter Phys       Date:  2006-09-11

2.  HyAsP, a greedy tool for plasmids identification.

Authors:  Robert Müller; Cedric Chauve
Journal:  Bioinformatics       Date:  2019-11-01       Impact factor: 6.937

3.  plasmidSPAdes: assembling plasmids from whole genome sequencing data.

Authors:  Dmitry Antipov; Nolan Hartwick; Max Shen; Mikhail Raiko; Alla Lapidus; Pavel A Pevzner
Journal:  Bioinformatics       Date:  2016-07-27       Impact factor: 6.937

4.  PLACNETw: a web-based tool for plasmid reconstruction from bacterial genomes.

Authors:  Luis Vielva; María de Toro; Val F Lanza; Fernando de la Cruz
Journal:  Bioinformatics       Date:  2017-12-01       Impact factor: 6.937

5.  Recycler: an algorithm for detecting plasmids from de novo assembly graphs.

Authors:  Roye Rozov; Aya Brown Kav; David Bogumil; Naama Shterzer; Eran Halperin; Itzhak Mizrahi; Ron Shamir
Journal:  Bioinformatics       Date:  2017-02-15       Impact factor: 6.937

6.  Complete Assembly of Escherichia coli Sequence Type 131 Genomes Using Long Reads Demonstrates Antibiotic Resistance Gene Variation within Diverse Plasmid and Chromosomal Contexts.

Authors:  Arun Gonzales Decano; Catherine Ludden; Theresa Feltwell; Kim Judge; Julian Parkhill; Tim Downing
Journal:  mSphere       Date:  2019-05-08       Impact factor: 4.389

7.  Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes.

Authors:  Nicola De Maio; Liam P Shaw; Alasdair Hubbard; Sophie George; Nicholas D Sanderson; Jeremy Swann; Ryan Wick; Manal AbuOun; Emma Stubberfield; Sarah J Hoosdally; Derrick W Crook; Timothy E A Peto; Anna E Sheppard; Mark J Bailey; Daniel S Read; Muna F Anjum; A Sarah Walker; Nicole Stoesser
Journal:  Microb Genom       Date:  2019-08-30

8.  PlasFlow: predicting plasmid sequences in metagenomic data using genome signatures.

Authors:  Pawel S Krawczyk; Leszek Lipinski; Andrzej Dziembowski
Journal:  Nucleic Acids Res       Date:  2018-04-06       Impact factor: 16.971

9.  mlplasmids: a user-friendly tool to predict plasmid- and chromosome-derived sequences for single species.

Authors:  Sergio Arredondo-Alonso; Malbert R C Rogers; Johanna C Braat; Tess D Verschuuren; Janetta Top; Jukka Corander; Rob J L Willems; Anita C Schürch
Journal:  Microb Genom       Date:  2018-11-01

10.  Plasmids Shaped the Recent Emergence of the Major Nosocomial Pathogen Enterococcus faecium.

Authors:  S Arredondo-Alonso; J Top; R J L Willems; J Corander; A C Schürch; A McNally; S Puranen; M Pesonen; J Pensar; P Marttinen; J C Braat; M R C Rogers; W van Schaik; S Kaski
Journal:  mBio       Date:  2020-02-11       Impact factor: 7.867

View more
  4 in total

1.  The Space-Exposed Kombucha Microbial Community Member Komagataeibacter oboediens Showed Only Minor Changes in Its Genome After Reactivation on Earth.

Authors:  Daniel Santana de Carvalho; Ana Paula Trovatti Uetanabaro; Rodrigo Bentes Kato; Flávia Figueira Aburjaile; Arun Kumar Jaiswal; Rodrigo Profeta; Rodrigo Dias De Oliveira Carvalho; Sandeep Tiwar; Anne Cybelle Pinto Gomide; Eduardo Almeida Costa; Olga Kukharenko; Iryna Orlovska; Olga Podolich; Oleg Reva; Pablo Ivan P Ramos; Vasco Ariston De Carvalho Azevedo; Bertram Brenig; Bruno Silva Andrade; Jean-Pierre P de Vera; Natalia O Kozyrovska; Debmalya Barh; Aristóteles Góes-Neto
Journal:  Front Microbiol       Date:  2022-03-11       Impact factor: 5.640

2.  Characterization of qnrB-carrying plasmids from ESBL- and non-ESBL-producing Escherichia coli.

Authors:  Katharina Juraschek; Janina Malekzadah; Burkhard Malorny; Annemarie Käsbohrer; Stefan Schwarz; Diana Meemken; Jens Andre Hammerl
Journal:  BMC Genomics       Date:  2022-05-12       Impact factor: 4.547

3.  Molecular Characterization and Survive Abilities of Salmonella Heidelberg Strains of Poultry Origin in Brazil.

Authors:  Roberta T Melo; Newton N Galvão; Micaela Guidotti-Takeuchi; Phelipe A B M Peres; Belchiolina B Fonseca; Rodrigo Profeta; Vasco A C Azevedo; Guilherme P Monteiro; Bertram Brenig; Daise A Rossi
Journal:  Front Microbiol       Date:  2021-06-18       Impact factor: 5.640

4.  Mode and dynamics of vanA-type vancomycin resistance dissemination in Dutch hospitals.

Authors:  Sergio Arredondo-Alonso; Janetta Top; Jukka Corander; Rob J L Willems; Anita C Schürch
Journal:  Genome Med       Date:  2021-01-20       Impact factor: 11.117

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.