| Literature DB >> 33174833 |
Anish Pandey1,2, Maria Victoria Humbert2, Alexandra Jackson2, Jade L Passey3, David J Hampson4, David W Cleary1,2, Roberto M La Ragione3, Myron Christodoulides2.
Abstract
The enteric, pathogenic spirochaete Brachyspira pilosicoli colonizes and infects a variety of birds and mammals, including humans. However, there is a paucity of genomic data available for this organism. This study introduces 12 newly sequenced draft genome assemblies, boosting the cohort of examined isolates by fourfold and cataloguing the intraspecific genomic diversity of the organism more comprehensively. We used several in silico techniques to define a core genome of 1751 genes and qualitatively and quantitatively examined the intraspecific species boundary using phylogenetic analysis and average nucleotide identity, before contextualizing this diversity against other members of the genus Brachyspira. Our study revealed that an additional isolate that was unable to be species typed against any other Brachyspira lacked putative virulence factors present in all other isolates. Finally, we quantified that homologous recombination has as great an effect on the evolution of the core genome of the B. pilosicoli as random mutation (r/m=1.02). Comparative genomics has informed Brachyspira diversity, population structure, host specificity and virulence. The data presented here can be used to contribute to developing advanced screening methods, diagnostic assays and prophylactic vaccines against this zoonotic pathogen.Entities:
Keywords: Brachyspira pilosicoli; Brachyspiragenus; pangenome; recombination, microbial evolution
Year: 2020 PMID: 33174833 PMCID: PMC8116685 DOI: 10.1099/mgen.0.000470
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
Isolates used in this study and genome assembly metrics
|
|
Strain |
Host |
Location |
Accession |
GC (%) |
Total length |
# contigs |
N50 |
|---|---|---|---|---|---|---|---|---|
|
|
B11 |
Chicken |
Australia |
VYIL00000000* |
27.70 |
3 156 632 |
26 |
528 032 |
|
|
B04 |
Chicken |
Australia |
VYIM00000000* |
27.92 |
2 607 200 |
11 |
1 319 479 |
|
|
B06 |
Chicken |
Australia |
VYIN00000000* |
27.81 |
2 789 651 |
80 |
165 358 |
|
|
B12 |
Chicken |
Australia |
VYIO00000000* |
27.91 |
2 593 600 |
9 |
2 586 710 |
|
|
B14 |
Chicken |
Australia |
VYIP00000000* |
27.91 |
2 588 857 |
47 |
181 709 |
|
|
B31 |
Chicken |
Australia |
VYIQ00000000* |
27.91 |
2 594 260 |
11 |
1 911 521 |
|
|
B37 |
Chicken |
Australia |
VYIR00000000* |
27.91 |
2 594 260 |
11 |
1 911 521 |
|
|
SAP_774 |
Chicken |
UK |
VYIS00000000* |
27.90 |
2 568 130 |
30 |
2 542 180 |
|
|
SAP_822 |
Chicken |
UK |
VYIT00000000* |
27.88 |
2 746 377 |
69 |
102 423 |
|
|
SAP_859 |
Chicken |
UK |
VYIU00000000* |
27.82 |
2 546 568 |
20 |
227 560 |
|
|
SAP_865 |
Chicken |
UK |
VYIV00000000* |
27.83 |
2 547 658 |
20 |
417 927 |
|
|
SAP_894 |
Chicken |
UK |
VYIW00000000* |
27.91 |
2 713 772 |
58 |
195 119 |
|
|
SAP_898 |
Chicken |
UK |
VYIY00000000* |
27.80 |
2 913 681 |
253 |
92 127 |
|
|
SAP_772 |
Chicken |
UK |
VYIX00000000* |
27.90 |
2 745 085 |
21 |
182 551 |
|
|
513 |
Human |
Denmark |
† |
28.11 |
2 558 428 |
11 |
2 532 317 |
|
|
NSH-16 |
Pig |
USA |
CP019914 |
27.38 |
3 189 639 |
1 |
3 189 639 |
|
|
ATCC27164 |
Pig |
USA |
NZ_CP015910 |
27.04 |
3 074 045 |
2 |
3 041 447 |
|
|
B256/ATCC29796 |
Pig |
UK |
ARQI01000129 |
27.73 |
3 281 611 |
130 |
52 799 |
|
|
PWS/A |
Pig |
UK |
CP002874 |
27.21 |
3 308 048 |
2 |
3 304 788 |
|
|
DSM 12563 |
Pig |
Canada |
CP001959 |
27.75 |
3 241 804 |
1 |
3 241 804 |
|
|
95/1000 |
Pig |
Australia |
CP002025 |
27.90 |
2 586 443 |
1 |
2 586 443 |
|
|
B2904 |
Chicken |
UK |
CP003490 |
27.79 |
2 765 477 |
1 |
2 765 477 |
|
|
P43/6/78 |
Pig |
UK |
CP002873 |
27.92 |
2 555 556 |
1 |
2 555 556 |
|
|
WesB |
Human |
Australia |
HE793032 |
27.73 |
2 889 522 |
1 |
28 899 522 |
|
|
AN4859/03 |
Pig |
Sweden |
CVLB01000001 |
27.00 |
3 256 103 |
30 |
2 243 936 |
All Brachyspira genomes used during the course of this study are listed in this table. Isolates assembled during the course of this study are separated from assemblies sourced from public repositories by a dividing line and have the ‘*’ symbol following their accession number. The species of every assembly has been confirmed by Kraken and the accession numbers provided were obtained from NCBI GenBank. Genome statistics, including GC percentage, total length of the isolate genome, number of contigs constituting the assembly and the largest contig and the N50 value are given. N50 is a weighted median statistic that comments on the distribution of contig lengths and overall genome assembly quality. Fifty per cent of the entire assembly is contained in contigs equal to or larger than the N50 value. ‘B’ refers to the genus Brachyspira.
†The B. aalborgi strain was obtained from the Sanger METAHIT consortium (https://www.sanger.ac.uk/resources/downloads/bacteria/metahit/).
Primer sequences for genus - and species-specific PCR
|
Target species |
Target gene |
Primer name |
Primer sequence (5′−3′) |
Size (bp) |
Reference |
|---|---|---|---|---|---|
|
Genus |
16S rRNA |
Br16S-F |
TGAGTAACACGTAGGTAATC |
1309 |
[ |
|
Br16S- R |
GCTAACGACTTCAGGTAAAAC | ||||
|
|
16S rRNA |
Acoli-F |
AGAGGAAAGTTTTTTCGCTT |
439 |
[ |
|
Acoli-R |
CCCCTACAATATCCAAGACT |
MLST allele data for 12 isolates
|
Isolate |
|
|
|
|
|
|
|
ST |
|---|---|---|---|---|---|---|---|---|
|
B04 |
5 |
|
|
12 |
|
|
|
|
|
B06 |
4 |
|
|
69 |
3 |
102 |
|
|
|
B12 |
5 |
37 |
105 |
27 |
18 |
49 |
35 |
ST134 |
|
B31 |
5 |
37 |
105 |
27 |
18 |
49 |
35 |
ST134 |
|
B37 |
5 |
37 |
105 |
27 |
18 |
49 |
35 |
ST134 |
|
B67 |
3 |
11 |
12 |
9 |
8 |
|
10 |
|
|
SAP_774 |
3 |
|
|
22 |
|
|
|
|
|
SAP_822 |
|
|
|
|
|
82 |
|
|
|
SAP_865 |
3 |
|
12 |
25 |
|
129 |
|
|
|
SAP_859 |
3 |
|
12 |
25 |
|
129 |
|
|
|
SAP_894 |
3 |
3 |
|
59 |
|
86 |
|
|
|
SAP_898 |
|
|
|
|
|
82 |
|
|
MLST allele data for the 12 B. pilosicoli isolates. Novel alleles and sequence types (STs) are indicated in bold. Allele abbreviations are as follows: adh, alcohol dehyrdogenase; alp, alkaline phosphatase; est, esterase; gdh, glutamate dehydrogenase; glpK, glucose kinase; pgm, phosphoglucomutase; and thi, acetyl-CoA acetyltransferase.
Fig. 1.COG category metrics and proportional distribution in the pangenome of B. pilosicoli. The diagram shows the percentage proportion of COGs in the core (blue) vs accessory genomes (red). *, there is a significant difference in the distribution of the numbers of a category present in the core and accessory genome compared to COGs in the core/accessory not assigned to that given COG category. (Contingency table, χ2-corrected, 1 degree of freedom, twin-tailed, Bonferroni corrected P value<0.002.)
Fig. 2.Phylogenetic tree. Phylogenetic inference reveals the extent of genomic diversity both within the genus and interspecifically among isolates. The tree was generated using RaxML version 8.2 and a GTR model. An alignment of 27 CDS was found to be core (95 % Seq-ID) to this dataset generated using the MAFFT aligner, rooted with genome, which was found to be the most distant.
Fig. 3.Recombination detection on the core genome of B. pilosicoli. Recombination events and sites are indicated along the branches of the phylogenetic tree. A given branch may display results for a specific isolate (see isolate label) or for an ancestral phylogenetic node (no isolate label). Recombination events are indicated by dark blue bars, while light blue sites are used to indicate no substitution and white sites indicate that a convergent mutation has occurred (one base or more) at that point in the phylogeny. When multiple convergent events occur within short nucleotide distance of each other a recombination event or ‘importation’ is identified by the software at a specific position in the core alignment affecting x number of isolates or x number of ancestral phylogenetic nodes which affect multiple isolates. Recombination events discussed in the Results section are indicated with yellow, green and red circles. Their corresponding genes are identified with full annotation data in Data S6.
Fig. 4.Gene distribution map of the pangenome plus SAP_772. On the left of the figure is an unrooted, ML phylogeny based on an accessory genome alignment (for more accurate inference of phylogenetic relationships see Fig. 2). In the centre, blue segments represent gene presence and white segments represent gene absence. The pangenome is displayed, starting from the core genome on the left and transitioning into the accessory genome (shell and cloud genomes) with increasing gene sequence disparity. The bottom graph displays a trace showing the percentage of isolates containing blue segments of gene presence. Similar to the centre, the graph starts with the core genome (n=17 genomes) and falls steadily as it transitions into the accessory [shell: n=2–16 isolates (approximately 95 % - >1 % of the plus B. SAP_772 cohort) and cloud: n=1 isolate, approx. <1 % of the cohort) genome displaying blocks of genes shared by fewer and fewer genomes. Red lines are used to mark the transitions from the core genome to the shell and cloud genomes.
Fig. 5.Average nucleotide identity (ANI). These four heatmaps visualize four tests performed by PyANI. These are (a) 1020 bp fragment blastn+ analysis, ANIb (b) ANI-blast_all via legacy blastn on 1020 bp fragments, ANIb-allvall, (c) muscle alignment ANI-M and, lastly, (d) tetranucleotide frequency analysis ANI-TETRA. As shown by the scale, red indicates increasing homology, while blue denotes decreases in homology between isolates examined in the heatmaps.
Matches and no matches to and B. SAP_772 isolates subject to virulence factor analysis
|
Sample |
Matches/ putative alleles |
No match |
|---|---|---|
|
|
200 |
3 |
|
|
200 |
4 |
|
|
199 |
4 |
|
|
199 |
4 |
|
|
198 |
6 |
|
|
198 |
4 |
|
|
197 |
4 |
|
|
196 |
7 |
|
|
194 |
3 |
|
|
190 |
9 |
|
|
188 |
10 |
|
|
171 |
6 |
|
|
159 |
16 |
This table lists the lack of matches versus exact matches and instances of putative allele assignment using SRST2 default parameters. Exact matches are instances where sequencing reads have mapped in perfect alignment with a virulence factor CDS. SRST2 assigns putative allele status to any sequence sharing a minimum of 90 % coverage with reads mapping to it. Uncertain results flagged by the pipeline were inspected manually and removed from counts.
List of putative virulence factors undetected among one sample of isolate reads from or B. SAP_772
|
Isolate |
No. |
Undetected putative virulence factors |
|---|---|---|
|
SAP_772 |
10 |
B2904_orf1521, iron/sulphur flavoprotein |
|
|
BP951000_1807, membrane lipoprotein | |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
SAP_894 |
3 |
BP951000_0437, peptidase C14, caspase catalytic subunit p20 |
|
|
BP951000_1159/ | |
|
|
| |
|
SAP_822 |
3 |
ADK31727/ |
|
|
BP951000_0437, peptidase C14, caspase catalytic subunit p20 | |
|
|
BP951000_1779, probable metal-dependent glycoprotease | |
|
B06 |
2 |
B2904_orf2005, lipoprotein |
|
|
B2904_orf651, lipoprotein | |
|
B14 |
1 |
|
|
SAP_774 |
1 |
BP951000_2039, putative periplasmic binding protein |
|
B04 |
0 | |
|
B12 |
0 | |
|
SAP_859 |
0 | |
|
SAP_865 |
0 | |
|
B31 |
0 | |
|
SAP_898 |
0 | |
|
B37 |
0 |
Accession numbers, cluster identities and functional annotation available for putative virulence factors are detailed in Data S1. *These proteins had the same functional annotation as another protein named the same but possessed <90 % sequence identity. They are labelled with an underscore and capital letter to distinguish them for virulence factor screening.