Literature DB >> 28348871

NGMASTER:in silico multi-antigen sequence typing for Neisseria gonorrhoeae.

Jason C Kwong1,2,3, Anders Gonçalves da Silva2,3, Kristin Dyet4, Deborah A Williamson3, Timothy P Stinear2,1, Benjamin P Howden2,1,3, Torsten Seemann2,5.   

Abstract

Whole-genome sequencing (WGS) provides the highest resolution analysis for comparison of bacterial isolates in public health microbiology. However, although increasingly being used routinely for some pathogens such as Listeria monocytogenes and Salmonella enterica, the use of WGS is still limited for other organisms, such as Neisseria gonorrhoeae. Multi-antigen sequence typing (NG-MAST) is the most widely performed typing method for epidemiological surveillance of gonorrhoea. Here, we present NGMASTER, a command-line software tool for performing in silico NG-MAST on assembled genome data. NGMASTER rapidly and accurately determined the NG-MAST of 630 assembled genomes, facilitating comparisons between WGS and previously published gonorrhoea epidemiological studies. The source code and user documentation are available at https://github.com/MDU-PHL/ngmaster.

Entities:  

Keywords:  In silico typing; Multi-antigen sequence typing; NG-MAST; Neisseria gonorrhoeae; Whole-genome sequencing

Mesh:

Year:  2016        PMID: 28348871      PMCID: PMC5320595          DOI: 10.1099/mgen.0.000076

Source DB:  PubMed          Journal:  Microb Genom        ISSN: 2057-5858


The Python source code for NGMASTER is available from GitHub under GNU GPL v2. (URL: https://github.com/MDU-PHL/ngmaster) The software is installable via the Python ‘pip’ package management system. Install using ‘pip install – user git https://github.com/MDU-PHL/ngmaster.git’ Sequencing data used are available for download from the EBI European Nucleotide Archive under BioProject accessions PRJEB2999, PRJNA29335, PRJNA266539, PRJNA298332, and PRJEB14168.

Impact Statement

Whole-genome sequencing (WGS) offers the potential for high-resolution comparative analyses of microbial pathogens. However, there remains a need for backward compatibility with previous molecular typing methods to place genomic studies in context. NG-MAST is currently the most widely used method for epidemiological surveillance of Neisseria gonorrhoeae. We present NGMASTER, a command-line software tool for performing multi-antigen sequence typing (NG-MAST) of Neisseria gonorrhoeae from WGS data. This tool is targeted at clinical and research microbiology laboratories that have performed WGS of N. gonorrhoeae isolates and wish to understand the molecular context of their data in comparison to previously published epidemiological studies. As WGS becomes more routinely performed, NGMASTER has been developed to completely replace PCR-based NG-MAST, reducing time and labour costs.

Introduction

Neisseria gonorrhoeae is one of the most common sexually transmitted bacterial infections worldwide. There is growing concern about the global spread of resistant epidemic clones, with extensively drug-resistant gonorrhoea being listed as an urgent antimicrobial resistance threat (CDC, 2013; WHO, 2014). Multi-antigen sequence typing of N. gonorrhoeae (NG-MAST) has been important in tracking these resistant clones, such as the NG-MAST 1407 clone associated with decreased susceptibility to third-generation cephalosporins (Unemo & Dillon, 2011). It involves sequence-based typing using established PCR primers of two highly variable and polymorphic outer membrane protein genes, porB and tbpB by comparing the sequences to an open-access database (http://www.ng-mast.net/) (Martin). Although NG-MAST is the most frequently performed molecular typing method for N. gonorrhoeae, it requires multiple PCR amplification and sequencing reactions, making it more laborious than other gonococcal typing methods such as single porB gene sequencing, or fragment analysis methods such as multiple locus variable-number tandem repeat analysis (MLVA) (Heymans). Whole-genome sequencing (WGS) is increasingly being used for molecular typing and epidemiological investigation of microbial pathogens as it provides considerably higher resolution. A number of studies using genomic data to understand the epidemiology of N. gonorrhoeae have already been published (Grad) (Demczuk) (Ezewudo) (Demczuk). However, the ability to perform retrospective comparisons with previous epidemiological studies is reliant on conducting both traditional typing (such as NG-MAST) as well as more modern WGS analyses on the same isolates. NGMASTER is a command-line software tool for rapidly determining NG-MAST types in silico from genome assemblies of N. gonorrhoeae.

Description

NGMASTER is an open source tool written in Python and released under a GPLv2 Licence. The source code can be downloaded from Github (). It has two software dependencies: isPcr () and BioPython (Cock), and uses the allele databases publicly available at , which NGMASTER can automatically download and update locally for running. NGMASTER is based on the laboratory method published by Martin, and uses isPcr to retrieve allele sequences from a user-specified genome assembly in FASTA format by locating the flanking primers. These allele sequences are trimmed to a set length from starting key motifs in conserved gene regions, and then checked against the allele databases. Results are printed in machine readable tab- or comma-separated format.

Methods

NGMASTER was validated against 630 publicly available N. gonorrhoeae genome sequences derived from published studies (Table 1). A PubMed search for published studies with N. gonorrhoea whole-genome sequencing data was conducted (on 4 May, 2016) using the search terms ‘Neisseria gonorrhoeae’ and ‘whole-genome sequencing’. We excluded studies with less than 20 isolates, and those that did not publish NG-MAST results or make their raw sequencing data available. Our search identified three studies, contributing 572 sequences for testing that had undergone manual in silico NG-MAST from WGS data (Demczuk; Demczuk; Grad), including the fully assembled reference genome NCCP11945 (Chung ). The panel of isolates also included the genome sequencing data for eight well characterised WHO reference genomes with published NG-MAST results. Raw WGS data for these sequences were retrieved from the European Nucleotide Archive (ENA). Average sequencing depth was >30× for all ENA sequences, with a combination of 100 bp, 250 bp and 300 bp paired-end Illumina reads. In addition, we tested an additional 50 local isolates that had undergone ‘traditional’ NG-MAST by PCR and Sanger sequencing (Martin). These isolates underwent WGS on the Illumina MiSeq/NextSeq using Nextera libraries and manufacturer protocols, each with an average sequencing depth >50×. The raw sequencing reads for these local isolates have been uploaded to the ENA (BioProject accession PRJEB14168).
Table 1.

Concordance between NGMASTER results from draft genome assemblies using MEGAHIT and SPAdes, and previously published NG-MAST results

MEGAHITSPAdesTwo-stage¶Total
PRJEB2999*176 (95 %)184 (99 %)184 (99 %)186
PRJNA29335†1
PRJNA266539‡162 (91 %)169 (94 %)178 (99 %)179
PRJNA298332§199 (93 %)207 (97 %)208 (97 %)214
PRJEB14168||50 (100 %)50 (100 %)50 (100 %)50
Total587 (93 %)610 (97 %)620 (98 %)630

*Grad )

†Demczuk )

‡Demczuk )

§Closed reference genome NCCP11945 (Genbank accession CP001050.1) – in silico NG-MAST results reported by Demczuk.

||Local isolates with NG-MAST performed by PCR/Sanger sequencing.

¶Two-stage assembly: 1. NGMASTER run using rapid assembly with MEGAHIT; 2. NGMASTER also run using SPAdes if there was no result or a mixed result using MEGAHIT assembly.

*Grad ) †Demczuk ) ‡Demczuk ) §Closed reference genome NCCP11945 (Genbank accession CP001050.1) – in silico NG-MAST results reported by Demczuk. ||Local isolates with NG-MAST performed by PCR/Sanger sequencing. ¶Two-stage assembly: 1. NGMASTER run using rapid assembly with MEGAHIT; 2. NGMASTER also run using SPAdes if there was no result or a mixed result using MEGAHIT assembly. Sequencing reads were trimmed to clip Illumina adapters and low-quality sequences (minimum Q20) using Trimmomatic v0.35 (Bolger). Draft genomes were assembled de novo with MEGAHIT v1.0.3 and SPAdes v3.7.1 (Li) (Bankevich) to investigate whether the faster, but approximate genome assembler, MEGAHIT, would be sufficient for NGMASTER. A list of the commands and parameters used is included in Appendix 1. The de novo assembled draft genomes and the fully assembled NCCP11945 reference genome in FASTA format were used as input to NGMASTER with the overall results shown in Table 1. Complete NGMASTER results with sequencing and assembly metrics are included in Appendix 2. Running NGMASTER on 630 genome assemblies using a single Intel(R) Xeon(R) 2.3GHz CPU core was completed in less than two minutes. Overall, NGMASTER assigned NG-MAST types that were concordant with published results for 93 % of the tested N. gonorrhoeae genomes using MEGAHIT assemblies, and 97 % using SPAdes assemblies. Notably, comparisons with results from traditional NG-MAST were 100 % concordant (58/58), including 50 local isolates and the eight well-characterised WHO reference isolates. Reasons for discordant results are shown in Table 2. In general, running NGMASTER using SPAdes assemblies resolved more NG-MAST types than when using MEGAHIT assemblies. However, ten genomes assembled with SPAdes v3.7.1 were found to have assembly errors in either por or tbpB introduced at the repeat resolution stage of the SPAdes assembly process, resulting in discordant NG-MAST types for those isolates (major errors). Running NGMASTER on preliminary contigs prior to this process (in particular, on the ‘before_rr.fasta’ intermediate file generated by SPAdes in the assembly output folder) or disabling repeat resolution using the flag ‘--disable-rr’ when running SPAdes alleviated these major errors, and were concordant with MEGAHIT results and the published results (Appendix 2). In contrast, minor errors (due to incomplete NG-MAST types or multiple alleles detected) were more frequent using MEGAHIT assemblies, particularly those with poor assembly metrics (e.g. >500 contigs, N50<10 kbp). When MEGAHIT assemblies successfully produced complete NGMASTER results, these NG-MAST types were highly concordant with the published results.
Table 2.

Reasons for discordant results between NGMASTER and published data using SPAdes assemblies

Reason for discordant resultMEGAHITSPAdes
Major errors (incorrect result)
Assembly error010
Minor errors (incomplete/missing result)
Alternate conserved key motif11
Multiple alleles detected62
Allele not detected290
Errors in published data
Possible sequence mix-up in published data44
Probable transcription error in published data11
Error in published data11
To overcome this issue, a two-stage assembly approach was also tested, where a draft genome was first assembled using MEGAHIT for initial testing. If a complete NG-MAST result was obtained, this was recorded as the final result for that isolate. If the result was incomplete or suggested multiple alleles were present, the genome was also assembled using SPAdes. Using this combined approach, 620 out of 630 (98 %) NG-MAST types derived from NGMASTER were concordant with the published results, with only 42 genomes requiring additional assembly with the slower SPAdes assembler. For the remaining ten discordant results, seven of these were likely to be due to errors in the published data, including for NCCP11945. A further two isolates were found to have multiple tbpB alleles in both SPAdes and MEGAHIT assemblies, with the dominant allele (indicated by higher read coverage and better flanking assembly) matching the published result. The tbpB allele for the final isolate was not able to be determined by NGMASTER due to a mutation in the conserved starting key motif required for sequence trimming to a standard size.

Issues with implementation

The NG-MAST procedure involves sequencing the internal regions of por and tbpB that encode two variable outer membrane proteins. The sequences are trimmed to a standard length from a starting key motif in conserved regions of each gene. However, despite being relatively conserved, a number of variations of this starting motif appear in the NG-MAST database (Fig. 1), causing one discordant result (Table 2). Some sequences appeared to lack a tbpB gene due to the presence of non-typeable tbpB genes acquired from N. meningitidis, though this was also noted in the published data. Another source of discordant results was genomes that appeared to have multiple alleles, suggesting isolate contamination or polyclonal infection.
Fig. 1.

Number and frequency of alternate starting key motifs within ‘conserved’ gene regions for trimming allele sequences.

Number and frequency of alternate starting key motifs within ‘conserved’ gene regions for trimming allele sequences. A number of isolates were found to have novel alleles or allele combinations that were not in the most recent version of the database available at http://www.ng-mast.net. For convenience, NGMASTER includes an option to save these allele sequences in FASTA format for manual submission to the database and allele type assignment. Notably, results were dependent on the accuracy and quality of the de novo draft genome assembly. It should be noted that for this study, draft genomes were assembled de novo using relatively standard parameters for MEGAHIT and SPAdes without post-assembly error checking (see Appendix 1). We were alerted to the presence of SPAdes assembly errors after finding the corresponding MEGAHIT assemblies produced different NGMASTER results. Concordant results were obtained for each of these genomes after identifying and correcting assembly errors through re-mapping each isolate’s sequencing reads back to the respective draft SPAdes assembly (see Appendix 1). These errors introduced during the SPAdes assembly process can also be corrected using an assembly polishing tool such as Pilon (Walker ). Assuming accurate closed genome assemblies are used with an accurate and well curated database, based on our testing, we anticipate that NGMASTER would produce NG-MAST results that were >99 % if not 100 % accurate.

Conclusion

NGMASTER rapidly and accurately performs in silico NG-MAST typing of N. gonorrhoeae from assembled WGS data, and may be a useful command-line tool to help contextualise genomic epidemiological studies of N. gonorrhoeae.
  13 in total

1.  Evaluation of Neisseria gonorrhoeae multiple-locus variable-number tandem-repeat analysis, N. gonorrhoeae Multiantigen sequence typing, and full-length porB gene sequence analysis for molecular epidemiological typing.

Authors:  Raymond Heymans; Daniel Golparian; Sylvia M Bruisten; Leo M Schouls; Magnus Unemo
Journal:  J Clin Microbiol       Date:  2011-11-09       Impact factor: 5.948

2.  Rapid sequence-based identification of gonococcal transmission clusters in a large metropolitan area.

Authors:  Iona M C Martin; Catherine A Ison; David M Aanensen; Kevin A Fenton; Brian G Spratt
Journal:  J Infect Dis       Date:  2004-03-31       Impact factor: 5.226

3.  MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph.

Authors:  Dinghua Li; Chi-Man Liu; Ruibang Luo; Kunihiko Sadakane; Tak-Wah Lam
Journal:  Bioinformatics       Date:  2015-01-20       Impact factor: 6.937

Review 4.  Review and international recommendation of methods for typing neisseria gonorrhoeae isolates and their implications for improved knowledge of gonococcal epidemiology, treatment, and biology.

Authors:  Magnus Unemo; Jo-Anne R Dillon
Journal:  Clin Microbiol Rev       Date:  2011-07       Impact factor: 26.132

5.  Whole-genome phylogenomic heterogeneity of Neisseria gonorrhoeae isolates with decreased cephalosporin susceptibility collected in Canada between 1989 and 2013.

Authors:  Walter Demczuk; Tarah Lynch; Irene Martin; Gary Van Domselaar; Morag Graham; Amrita Bharat; Vanessa Allen; Linda Hoang; Brigitte Lefebvre; Greg Tyrrell; Greg Horsman; David Haldane; Richard Garceau; John Wylie; Tom Wong; Michael R Mulvey
Journal:  J Clin Microbiol       Date:  2014-11-05       Impact factor: 5.948

6.  Complete genome sequence of Neisseria gonorrhoeae NCCP11945.

Authors:  Gyung Tae Chung; Jeong Sik Yoo; Hee Bok Oh; Yeong Seon Lee; Sun Ho Cha; Sang Jun Kim; Cheon Kwon Yoo
Journal:  J Bacteriol       Date:  2008-06-27       Impact factor: 3.490

7.  Genomic Epidemiology and Molecular Resistance Mechanisms of Azithromycin-Resistant Neisseria gonorrhoeae in Canada from 1997 to 2014.

Authors:  Walter Demczuk; Irene Martin; Shelley Peterson; Amrita Bharat; Gary Van Domselaar; Morag Graham; Brigitte Lefebvre; Vanessa Allen; Linda Hoang; Greg Tyrrell; Greg Horsman; John Wylie; David Haldane; Chris Archibald; Tom Wong; Magnus Unemo; Michael R Mulvey
Journal:  J Clin Microbiol       Date:  2016-03-02       Impact factor: 5.948

8.  Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement.

Authors:  Bruce J Walker; Thomas Abeel; Terrance Shea; Margaret Priest; Amr Abouelliel; Sharadha Sakthikumar; Christina A Cuomo; Qiandong Zeng; Jennifer Wortman; Sarah K Young; Ashlee M Earl
Journal:  PLoS One       Date:  2014-11-19       Impact factor: 3.240

9.  Population structure of Neisseria gonorrhoeae based on whole genome data and its relationship with antibiotic resistance.

Authors:  Matthew N Ezewudo; Sandeep J Joseph; Santiago Castillo-Ramirez; Deborah Dean; Carlos Del Rio; Xavier Didelot; Jo-Anne Dillon; Richard F Selden; William M Shafer; Rosemary S Turingan; Magnus Unemo; Timothy D Read
Journal:  PeerJ       Date:  2015-03-05       Impact factor: 2.984

10.  Trimmomatic: a flexible trimmer for Illumina sequence data.

Authors:  Anthony M Bolger; Marc Lohse; Bjoern Usadel
Journal:  Bioinformatics       Date:  2014-04-01       Impact factor: 6.937

View more
  12 in total

Review 1.  Microbial sequence typing in the genomic era.

Authors:  Marcos Pérez-Losada; Miguel Arenas; Eduardo Castro-Nallar
Journal:  Infect Genet Evol       Date:  2017-09-21       Impact factor: 3.342

2.  Global Emergence and Dissemination of Neisseria gonorrhoeae ST-9363 Isolates with Reduced Susceptibility to Azithromycin.

Authors:  Sandeep J Joseph; Jesse C Thomas; Matthew W Schmerer; John C Cartee; Sancta St Cyr; Karen Schlanger; Ellen N Kersh; Brian H Raphael; Kim M Gernert
Journal:  Genome Biol Evol       Date:  2022-01-04       Impact factor: 3.416

3.  Phylogenomic Comparison of Neisseria gonorrhoeae Causing Disseminated Gonococcal Infections and Uncomplicated Gonorrhea in Georgia, United States.

Authors:  John C Cartee; Sandeep J Joseph; Emily Weston; Cau D Pham; Jesse C Thomas; Karen Schlanger; Sancta B St Cyr; Monica M Farley; Ashley E Moore; Amy K Tunali; Charletta Cloud; Brian H Raphael
Journal:  Open Forum Infect Dis       Date:  2022-05-13       Impact factor: 4.423

4.  Antibiotic Treatment Regimes as a Driver of the Global Population Dynamics of a Major Gonorrhea Lineage.

Authors:  Magnus N Osnes; Lucy van Dorp; Ola B Brynildsrud; Kristian Alfsnes; Thamarai Schneiders; Kate E Templeton; Koji Yahara; Francois Balloux; Dominique A Caugant; Vegard Eldholm
Journal:  Mol Biol Evol       Date:  2021-04-13       Impact factor: 16.240

5.  Use of whole genome sequencing to investigate an increase in Neisseria gonorrhoeae infection among women in urban areas of Australia.

Authors:  Cameron Buckley; Brian M Forde; Ella Trembizki; Monica M Lahra; Scott A Beatson; David M Whiley
Journal:  Sci Rep       Date:  2018-01-24       Impact factor: 4.379

6.  Genomic epidemiology and antimicrobial resistance of Neisseria gonorrhoeae in New Zealand.

Authors:  Robyn S Lee; Torsten Seemann; Helen Heffernan; Jason C Kwong; Anders Gonçalves da Silva; Glen P Carter; Rosemary Woodhouse; Kristin H Dyet; Dieter M Bulach; Timothy P Stinear; Benjamin P Howden; Deborah A Williamson
Journal:  J Antimicrob Chemother       Date:  2018-02-01       Impact factor: 5.790

7.  Whole-Genome Sequencing of Russian Neisseria Gonorrhoeae Isolates Related to ST 1407 Genogroup.

Authors:  A A Kubanov; A V Runina; A V Chestkov; A V Kudryavtseva; Y A Pekov; I O Korvigo; D G Deryabin
Journal:  Acta Naturae       Date:  2018 Jul-Sep       Impact factor: 1.845

8.  The impact of antimicrobials on gonococcal evolution.

Authors:  Leonor Sánchez-Busó; Daniel Golparian; Jukka Corander; Yonatan H Grad; Makoto Ohnishi; Rebecca Flemming; Julian Parkhill; Stephen D Bentley; Magnus Unemo; Simon R Harris
Journal:  Nat Microbiol       Date:  2019-07-29       Impact factor: 17.745

9.  Evidence of Horizontal Gene Transfer of 50S Ribosomal Genes rplB, rplD, and rplY in Neisseria gonorrhoeae.

Authors:  Sheeba Santhini Manoharan-Basil; Jolein Gyonne Elise Laumen; Christophe Van Dijck; Tessa De Block; Irith De Baetselier; Chris Kenyon
Journal:  Front Microbiol       Date:  2021-06-10       Impact factor: 5.640

Review 10.  Bioinformatics tools used for whole-genome sequencing analysis of Neisseria gonorrhoeae: a literature review.

Authors:  Reema Singh; Anthony Kusalik; Jo-Anne R Dillon
Journal:  Brief Funct Genomics       Date:  2022-04-11       Impact factor: 4.840

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.