| Literature DB >> 33195978 |
Nicholas R Waters1,2, Florence Abram1, Fiona Brennan1,3, Ashleigh Holmes4, Leighton Pritchard2,5.
Abstract
The Clermont PCR method for phylotyping Escherichia coli remains a useful classification scheme even though genome sequencing is now routine, and higher-resolution sequence typing schemes are now available. Relating present-day whole-genome E. coli classifications to legacy phylotyping is essential for harmonizing the historical literature and understanding of this important organism. Therefore, we present EzClermont - a novel in silico Clermont PCR phylotyping tool to enable ready application of this phylotyping scheme to whole-genome assemblies. We evaluate this tool against phylogenomic classifications, and an alternative software implementation of Clermont typing. EzClermont is available as a web app at www.ezclermont.org, and as a command-line tool at https://nickp60.github.io/EzClermont/.Entities:
Keywords: Escherichia coli; bioinformatics; classification; genomics; phylogroups web app
Year: 2020 PMID: 33195978 PMCID: PMC7656184 DOI: 10.1099/acmi.0.000143
Source DB: PubMed Journal: Access Microbiol ISSN: 2516-8290
Primers from the studies by Clermont and colleagues in 2013 and 2019 [4, 5]
Target amplicons were identified from canonical genes (or intergenic regions). Ambiguities determined by the training procedure were incorporated as degenerate primer sequences using standard IUPAC (International Union of Pure and Applied Chemistry) codes, which are translated into regular expressions by EzClermont. Variations occurring in the final five bases of the 3′ ends of the primers were not incorporated.
|
Primer |
Target gene |
Canonical |
Degenerate primer (5′→3′) |
|---|---|---|---|
|
AceK_f |
|
NC_000913.3: 4218596–4222487 |
AAYRCYATTCGCCAGCTTGC |
|
ArpA1_r |
TCTCCMCATACYGYACGCTA | ||
|
chuA_1b |
|
NC_011750.1: c4160640–4158658 |
ATGGTACYGGRCGAACCAAC |
|
chuA_2 |
TRCCRCCAGTRCCAAAGACA | ||
|
yjaA_1b |
|
NC_000913.3: 4213234–4213617 |
YAAACKTGAAGTGTCAGGAG |
|
yjaA_2b |
ARTRCGTTCCTCAACCTGTG | ||
|
TspE4C2_1b |
|
AF222188.1 |
CACKATTYGTAAGRYCATCC |
|
TspE4C2_2b |
AGTTTATCGCTGCGGGTCGC | ||
|
ArpAgpE_f |
|
NC_000913.3: 4220301–4222487 |
RATKCMATYTTGTCRAAATATGCC |
|
ArpAgpE_r |
GAAARKRAAAADAMYYYYCAAGAG | ||
|
trpBA_f |
|
NC_000913.3: 1316416–1318415 |
CGGSGATAAAGAYATYTTCAC |
|
trpBA_r |
GCAACGYGSCBWKRCGGAAG | ||
|
ybgD_F |
|
NZ_UIKK01000035.1 |
GTTGACTAARCGYAGGTCGA |
|
ybgD_R |
KATGYDGCYGATKAAGGATC | ||
|
trpAgpC_1 |
|
NC_000913.3: c1317222–1316416 |
AGTTYTAYGCCSVRWGCGAG |
|
trpAgpC_2 |
TCWGYDCYVGTYACGCCC |
Fig. 1.Cladogram of whole-genome phylogeny for members of the ECOR collection and phylogroup G isolates from the work by Clermont and colleagues [5]. Clades are background-coloured by dominant phylogroup. The heatmap surrounding the tree shows phylogroups determined from: literature (inner ring), ClermonTyping (middle ring) and EzClermont (outer ring). The literature phylogroup was not supported by in silico analysis for seven strains. Both EzClermont and ClermonTyping agree with the phylogenetic lineage in all but two cases: ECOR44 and ECOR49.
Isolates with inconsistent phylogroup predictions
EzClermont and ClermonTyping were run on a set of strains with reported phylotypes. A core SNP tree was reconstructed, allowing comparison between predicted and reported phylotypes, and the estimated phylogeny.
|
Strain |
Accession no. |
Reported |
Phylogeny |
ClermonTyping |
EzClermont |
Note |
|---|---|---|---|---|---|---|
|
APEC01 |
GCA_003028815.1 |
B2 |
A |
A |
A | |
|
ECOR07 |
GCA_003334305.1 |
A |
B1 |
B1 |
B1 | |
|
ECOR23 |
GCA_003334095.1 |
A |
B2 |
B2 |
B2 | |
|
ECOR43 |
GCA_003333775.1 |
A |
E |
E |
E | |
|
ECOR44 |
GCA_003333765.1 |
D |
D |
E |
G |
ArpA1_r G17A |
|
ECOR49 |
GCA_003333685.1 |
D |
D* |
G* |
G* | |
|
ECOR71 |
GCA_003333385.1 |
B1 |
C |
C |
C | |
|
ECOR72 |
GCA_003334425.1 |
B1 |
B1 |
C |
C | |
|
SMS-3–5 |
GCA_000019645.1 |
D |
F |
F |
F |
*Both tools mistype ECOR49 types as phylogroup G due to a potentially contaminated assembly; ECOR49 from assembly GCA002190975.1 is correctly typed by both tools as phylogroup D.