| Literature DB >> 35852343 |
Mostafa Y Abdel-Glil1,2,3, Prasad Thomas4, Christian Brandt3, Falk Melzer1, Anbazhagan Subbaiyan4, Pallab Chaudhuri4, Dag Harmsen5, Keith A Jolley6, Anna Janowicz7, Giuliano Garofolo7, Heinrich Neubauer1, Mathias W Pletz3.
Abstract
Brucellosis poses a significant burden to human and animal health worldwide. Robust and harmonized molecular epidemiological approaches and population studies that include routine disease screening are needed to efficiently track the origin and spread of Brucella strains. Core genome multilocus sequence typing (cgMLST) is a powerful genotyping system commonly used to delineate pathogen transmission routes for disease surveillance and control. Except for Brucella melitensis, cgMLST schemes for Brucella species are currently not established. Here, we describe a novel cgMLST scheme that covers multiple Brucella species. We first determined the phylogenetic breadth of the genus using 612 Brucella genomes. We selected 1,764 genes that were particularly well conserved and typeable in at least 98% of these genomes. We tested the new scheme on 600 genomes and found high agreement with the whole-genome-based single nucleotide polymorphism (SNP) analysis. Next, we applied the scheme to reanalyze the genome of Brucella strains from epidemiologically linked outbreaks. We demonstrated the applicability of the new scheme for high-resolution typing required in outbreak investigations as previously reported with whole-genome SNP methods. We also used the novel scheme to define the global population structure of the genus using 1,322 Brucella genomes. Finally, we demonstrated the possibility of tracing distribution of Brucella strains by performing cluster analysis of cgMLST profiles and found nearly identical cgMLST profiles in different countries. Our results show that sequencing depth of more than 40-fold is optimal for allele calling with this scheme. In summary, this study describes a novel Brucella-wide cgMLST scheme that is applicable in Brucella molecular epidemiology and helps in accurately tracking and thus controlling the sources of infection. The scheme is publicly accessible and should represent a valuable resource for laboratories with limited computational resources and bioinformatics expertise.Entities:
Keywords: Brucella; cgMLST; core genome MLST; epidemiology; genomic typing; whole-genome typing
Mesh:
Year: 2022 PMID: 35852343 PMCID: PMC9387271 DOI: 10.1128/jcm.00311-22
Source DB: PubMed Journal: J Clin Microbiol ISSN: 0095-1137 Impact factor: 11.677
FIG 1Taxonomic classification of 634 Brucella genomes downloaded from the NCBI Reference Sequence (RefSeq) database (March 2021). (A) A midpoint-rooted maximum-likelihood (ML) phylogeny of Brucella genomes calculated with PhyloPhlAn. The phylogeny is based on the concatenated alignment of 24,110 amino acid positions of up to 400 universally conserved bacterial proteins. (B) The results of pairwise average nucleotide identity (ANI) between all Brucella genomes calculated with FastANI and plotted with bactaxR. The distribution of the ANI values is represented in the histogram, and the relatedness between all genomes is illustrated by a dendrogram created using the average linkage hierarchical clustering method where the tree height corresponds to ANI similarity. The red branch denotes the genomes of all Brucella species while the blue branches denote divergent genomes from the species B. intermedia (O. intermedium) and Brucella pituitosa, as explained in the text.
FIG 2Minimum spanning tree (MST) calculated for 37 Brucella melitensis genomes with known epidemiological linkage using 1,764 Brucella-wide core genome MLST targets. The MST was generated with Ridom SeqSphere, ignoring missing values in pairwise comparisons. Each circle represents a unique cgMLST profile and is labeled according to the strain’s city of isolation. The circle size is proportional to the number of genomes per each cgMLST genotype. The number of different alleles between cgMLST profiles is indicated on the connecting lines. Solid and dashed lines represent allele differences below and above 10, respectively. Clusters are highlighted with gray-shaded areas based on a threshold of three allele mismatches between any two neighbors. The inset box shows the comparison of cgMLST allele distance and core genome SNPs for each genome pair. Genomic distances (cgMLST alleles and SNPs) within and between outbreak strains are highlighted by red and blue circles, respectively. For a phylogeny based on the core genome SNPs, we refer the reader to the original publication (35).
FIG 3Minimum spanning tree (MST) calculated for 76 Brucella abortus genomes using 1,764 Brucella-wide core genome MLST targets. The MST was generated with Ridom SeqSphere, ignoring missing values in pairwise comparisons. Each circle represents a unique cgMLST profile and is labeled according to the origin of strains. The circle size is proportional to the number of genomes per each cgMLST genotype. The number of different alleles between cgMLST profiles is indicated on the connecting lines. Solid and dashed lines represent allele differences below and above 10, respectively. Clusters are highlighted with gray-shaded areas based on a threshold of three allele mismatches between any two neighbors. The inset box shows the comparison of cgMLST allele distance and core genome SNPs for each genome pair. Genomic distances (cgMLST alleles and SNPs) within and between outbreak strains are highlighted by red and blue circles, respectively. For a phylogeny based on the core genome SNPs, we refer the reader to the original publication (45).
FIG 4Neighbor-joining tree constructed for the 1,322 Brucella genomes based on the cgMLST allelic profiles, deciphering the characteristic population structure of pathogenic Brucella. Tree visualization was performed using iTOL.
FIG 5Effect of various sequencing depths and assembly software on allele calling rate and cluster analysis with Brucella cgMLST. (A) Box plots of the mean percentage of allele calling rate according to assembler for data generated with different coverage depths (n = 40 genomes per group). (B) Alluvial plots showing the frequency of the effect of read depth on clustering results of cgMLST data.