Literature DB >> 25167160

Optimized multilocus sequence typing (MLST) scheme for Trypanosoma cruzi.

Patricio Diosque¹, Nicolás Tomasini¹, Juan José Lauthier¹, Louisa Alexandra Messenger², María Mercedes Monje Rumi¹, Paula Gabriela Ragone¹, Anahí Maitén Alberti-D'Amato¹, Cecilia Pérez Brandán¹, Christian Barnabé³, Michel Tibayrenc³, Michael David Lewis², Martin Stephen Llewellyn², Michael Alexander Miles², Matthew Yeo².

Abstract

Trypanosoma cruzi, the aetiological agent of Chagas disease possess extensive genetic diversity. This has led to the development of a plethora of molecular typing methods for the identification of both the known major genetic lineages and for more fine scale characterization of different multilocus genotypes within these major lineages. Whole genome sequencing applied to large sample sizes is not currently viable and multilocus enzyme electrophoresis, the previous gold standard for T. cruzi typing, is laborious and time consuming. In the present work, we present an optimized Multilocus Sequence Typing (MLST) scheme, based on the combined analysis of two recently proposed MLST approaches. Here, thirteen concatenated gene fragments were applied to a panel of T. cruzi reference strains encompassing all known genetic lineages. Concatenation of 13 fragments allowed assignment of all strains to the predicted Discrete Typing Units (DTUs), or near-clades, with the exception of one strain that was an outlier for TcV, due to apparent loss of heterozygosity in one fragment. Monophyly for all DTUs, along with robust bootstrap support, was restored when this fragment was subsequently excluded from the analysis. All possible combinations of loci were assessed against predefined criteria with the objective of selecting the most appropriate combination of between two and twelve fragments, for an optimized MLST scheme. The optimum combination consisted of 7 loci and discriminated between all reference strains in the panel, with the majority supported by robust bootstrap values. Additionally, a reduced panel of just 4 gene fragments displayed high bootstrap values for DTU assignment and discriminated 21 out of 25 genotypes. We propose that the seven-fragment MLST scheme could be used as a gold standard for T. cruzi typing, against which other typing approaches, particularly single locus approaches or systematic PCR assays based on amplicon size, could be compared.

Entities: CellLine Chemical Disease Species

Mesh：

Year: 2014 PMID： 25167160 PMCID： PMC4148231 DOI： 10.1371/journal.pntd.0003117

Source DB: PubMed Journal: PLoS Negl Trop Dis ISSN： 1935-2727

Introduction

Trypanosoma cruzi, the protozoan causative agent of Chagas disease, is a monophyletic and genetically heterogeneous taxon, with at least six phylogenetic lineages formally recognised as Discrete Typing Units (DTUs), TcI–TcVI [1], or near-clades (clades that are blurred by infrequent inter-lineage genetic recombination, [2]). T. cruzi is considered to have a predominantly clonal population structure but with at least some intra-lineage recombination [3], [4], [5], [6]. TcI and TcII are the most genetically distant groups, and the evolutionary origins of TcIII and TcIV remain controversial. Based on sequencing of individual nuclear genes Westenberger et al. [7] suggested that an ancient hybridisation event occurred between TcI and TcII followed by a long period of clonal propagation leading to the extant TcIII and TcIV. Alternatively, de Freitas et al. [8] suggested that TcIII and TcIV have a separate evolutionary ancestry with mitochondrial sequences that are similar to each other but distinct from both TcI and TcII. Recently, Flores-Lopez and Machado [9] proposed that TcIII and TcIV have no hybrid origin. Based on the sequence of 32 genes, they strongly suggested that TcI, TcIII and TcIV are clustered into a major clade that diverged from TcII around 1–2 millions of years ago. Less controversially, it is clear that TcV and TcVI, both overwhelmingly represented in the domestic transmission cycles in the Southern Cone region of South America, are hybrid lineages sharing haplotypes from both TcII and TcIII, with both DTUs retaining the mitochondrial genome of TcIII [8], [10]. Recent phylogenetic studies suggest that the emergence of the hybrid lineages TcV and TcVI may have occurred within the last 60,000 years [11]. Reliable DTU identification and the potential for high resolution investigation of genotypes at the intra DTU level are of great interest for epidemiological, host association, clinical and phylogenetic studies. Historically, a plethora of typing techniques have been applied to T. cruzi. Initial pioneering work applied multilocus enzyme electrophoresis (MLEE) techniques [12], [13], [14], [15], [16], [17], [18], [19], [20] revealing the remarkable genetic heterogeneity of this parasite. Subsequently, several PCR-based typing assays have been designed to differentiate the main DTUs [21], [22], [23], [24] and more recently, combinations of PCR-RFLP schemes have been published [25], [26], [27]. Some approaches based on DTU characterisation by direct sequential PCR amplifications from blood and tissue samples are also promising, although various sensitivity and reliability issues need to be resolved [28], [29], [30]. Microsatellite typing (MLMT) has also been applied to population data for fine-scale intra DTU genetic analysis [31], [32], [33]. Multilocus sequence typing (MLST), originally developed for bacterial species typing, has now been applied to a wide range of prokaryotic [34], [35], [36], [37] and increasingly eukaryotic microorganisms [38], [39], [40], [41], [42], [43], [44], [45], [46], [47], [48]. The technique typically involves the sequencing and concatenation of six to ten internal fragments of single copy housekeeping genes per strain [49]. Data are often hosted on interactive open access databases such as MLST.net for use in the wider research community. A major advantage of MLST analysis is that sequence data are unambiguous, minimizing interpretative errors. In this context, the MLST approach represents an excellent candidate to become the gold standard for T. cruzi genetic typing with outputs suitable for phylogenetic and epidemiological studies, particularly where large numbers of isolates from varied sources are under study. Recently, two multilocus sequence typing (MLST) schemes have been developed in parallel for T. cruzi, each of them based on different gene targets [50],[51]. Both schemes display a high discriminatory power and are able to clearly differentiate the main T. cruzi DTUs. The current work proposes to resolve the optimum combination of loci across the two schemes to produce a reproducible and robust formalised MLST scheme validated across a shared reference panel of isolates for practical use by the wider T. cruzi research community.

Methods

Parasite strains and DNA isolation

Twenty five cloned reference strains belonging to the six known DTUs were examined (Table 1). These strains have been widely used as reference strains in many previous studies, and are regularly examined in our laboratory by Multilocus Enzyme Electrophoresis (MLEE). Parasite stocks were cultivated at 28°C in liver infusion tryptose (LIT) supplemented with 1% hemin, 10% fetal bovine serum, 100 units/ml of penicillin, and 100 µg/mL of streptomycin or in supplemented RPMI liquid medium.

Table 1

Cohort of clonal reference isolates representing the six known T. cruzi lineages (DTUs).

Strain	DTU	Origin	Host
1. X10cl1	TcI	Belém, Brazil	Homo sapiens
2. Cutia c1	Tcl	Espiritu Santo, Brazil	Dasyprocta aguti
3. Sp104 cl1	Tcl	Region IV, Chile	Triatoma spinolai
4. P209 cl93	Tcl	Sucre, Bolivia	Homo sapiens
5. OPS21 cl11	Tcl	Cojedes, Venezuela	Homo sapiens
6. 92101601P cl1	TcI	Georgia, USA	Didelphis marsupialis
7. TU18 cl93	TcII	Potosí, Bolivia	Triatoma infestans
8. CBB cl3	TcII	Region IV, Chile	Homo sapiens
9. Mas cl1	TcII	Federal District, Brazil	Homo sapiens
10. IVV cl4	TcII	Region IV, Chile	Homo sapiens
11. Esm cl3	TcII	Sào Felipe, Brazil	Homo sapiens
12. M5631 cl5	TcIII	Selva Terra, Brazil	Dasypus novemcinctus
13. M6241 cl6	TcIII	Belem, Brazil	Homo sapiens
14. CM17	TcIII	Meta, Colombia	Dasypus sp.
15. X109/2	TcIII	Makthlawaiya, Paraguay	Canis familiaris
16. 92122102R	TcIV	Georgia, USA	Procyon lotor
17. CanIII cl1	TcIV	Belém, Brazil	Homo sapiens
18. Dog Theis	TcIV	USA	Canis familiaris
19. Mn cl2	TcV	Region IV, Chile	Homo Sapiens
20. Bug 2148 cl1	TcV	Rio Grande do sul, Brazil	Triatoma infestans
21. SO3 cl5	TcV	Potosi, Bolivia	Triatoma infestans
22. SC43 cl1	TcV	Santa-Cruz, Bolivia	Triatoma infestans
23. CL Brener	TcVI	Rio Grande do Sul, Brazil	Triatoma infestans
24. P63 cl1	TcVI	Makthlawaiya, Paraguay	Triatoma infestans
25. Tula cl2	TcVI	Talahuen, Chile	Homo sapiens

MLST loci

Initially a total of 19 gene fragments were considered, 10 housekeeping genes previously described by Lauthier et al. [50] [Glutathione peroxidase (GPX), 3-Hidroxi-3-metilglutaril-CoA reductase (HMCOAR), Piruvate dehydrogenase component E1 subunit alfa (PDH), Small GTP-binding protein Rab7 (GTP), Serine/treonine-protein phosphatase PP1 (STPP2), Rho-like GTP binding protein (RHO1), Glucose-6-phosphate isomerase (GPI), Superoxide dismutase A (SODA), Superoxide dismutase B (SODB) and Leucine aminopeptidase (LAP)]; and 9 gene fragments from Yeo et al. [51] [ascorbate-dependent haemoperoxidase (TcAPX), dihydrofolate reductase-thymidylate synthase (DHFR-TS), glutathione-dependent peroxidase II (TcGPXII), mitochondrial peroxidase (TcMPX), trypanothione reductase (TR), RNA-binding protein-19 (RB19), metacyclin-II (Met-II), metacyclin-III (Met-III) and LYT1]. However, 6 of them were discarded based on initial findings [50], [51]. Although some of the excluded targets were informative, they were not amenable for routine use. More specifically, LYT1 was discarded due to unreliable PCR amplification and sequencing despite multiple attempts at optimization; TR, DHFR-TS and TcAPX were also deemed unsuitable as internal sequencing primers were required; finally, Met-III and TcGPXII were also excluded because generated non-specific PCR products with some isolates. The final 13 gene fragments assessed included 3 fragments described by Yeo et al. [51] and the 10 housekeeping genes previously described by Lauthier et al. [50]. These were: TcMPX, RB19, Met-II, SODA, SODB, LAP, GPI, GPX, PDH, HMCOAR, RHO1, GTP and STPP2. For the 13 loci under study, searches in the CL-Brener and Sylvio X10 genomes (http://tritrypdb.org/tritrypdb/), using the primer sequences as well as the fragment sequences as query, displayed single matches in all cases. Chromosome location, primer sequences and amplicon size for each target are shown in Table 2. Nucleotide sequences for all the analysed MLST targets are available from GenBank under the following accession numbers: JN129501-JN129502, JN129511-JN129518, JN129523-JN129524, JN129534-JN129535, JN129544-JN129551, JN129556-JN129557, JN129567-JN129568, JN129577-JN129584, JN129589-JN129590, JN129600-JN129601, JN129610-JN129617, JN129622-JN129623, JN129633-JN129634, JN129643- JN129650, JN129655-JN129656, JN129666-JN129667, JN129676-JN129683, JN129688-JN129689, JN129699-JN129700, JN129709-JN129716, JN129721-JN129722, JN129732-JN129733, JN129742-JN129749, JN129754-JN129755, JN129765-JN129766, JN129775-JN129782, JN129787-JN129788, JN129798-JN129799, JN129808-JN129815, JN129820-JN129821, KF889442-KF889646. Additionaly, we used T. cruzi marinkellei as outgroup. Sequence data of the selected targets for T. cruzi marinkellei were obtained from TriTrypDB (http://tritrypdb.org), under the following accession Ids: TcMARK_CONTIG_2686, TcMARK_CONTIG_670, TcMARK_CONTIG_1404, Tc_MARK_2068, Tc_MARK_3409, Tc_MARK_5695, Tc_MARK_9874, Tc_MARK_515, Tc_MARK_4984, Tc_MARK_5926, Tc_MARK_8923, TcMARK_CONTIG_1818 and Tc_MARK_2666.

Table 2

Details of gene targets.

Gene	Gene IDa	Chromosome Number	Primer Sequence (5′-3′)	Amplicon size (bp)b	Sequence start 5′c	Fragment Length (bp)d
GPI * †	Tc00.1047053506529.508	6	CGCCATGTTGTGAATATTGG (20)	405	21	365
			GGCGGACCACAATGAGTATC (20)
HMCOAR * †	TC00.1047053506831.40	32	AGGAGGCTTTTGAGTCCACA (20)	554	21	514
			TCCAACAACACCAACCTCAA (20)
RHO1 * †	Tc00.1047053506649.40	8	AGTTGCTGCTTCCCATCAAT (20)	455	21	415
			CTGCACAGTGTATGCCTGCT (20)
Tc MPX * †	Tc00.1047053509499.14	22	ATGTTTCGTCGTATGGCC (18)	678	109	505
			TGCGTTTTTCTCAAAATATTC (21)
LAP *	Tc00.1047053508799.240	27	TGTACATGTTGCTTGGCTGAG (21)	444	22	402
			GCTGAGGTGATTAGCGACAAA (21)
SODB *	Tc00.1047053507039.10	35	GCCCCATCTTCAACCTT (17)	313	18	266
			TAGTACGCATGCTCCCATA (19)
RB19 *	Tc00.1047053507515.60	29	GCCTACACCGAGGAGTACCA (20)	408	49	340
			TTCTCCAATCCCCAGACTTG (20)
GPX	Tc00.1047053511543.60	35	CGTGGCACTCTCCAATTACA (20)	360	21	321
			AATTTAACCAGCGGGATGC (19)
PDH	Tc00.1047053507831.70	40	GGGGCAAGTGTTTGAAGCTA (20)	491	21	451
			AGAGCTCGCTTCGAGGTGTA (20)
GTP	Tc00.1047053503689.10	12	TGTGACGGGACATTTTACGA (20)	561	21	521
			CCCCTCGATCTCACGATTTA (20)
SODA	Tc00.1047053509775.40	21	CCACAAGGCGTATGTGGAC (19)	300	20	263
			ACGCACAGCCACGTCCAA (18)
STPP2	Tc00.1047053507673.10	34	CCGTGAAGCTTTTCAAGGAG (20)	409	21	369
			GCCCCACTGTTCGTAAACTC (20)
Met-II	TC00.1047053510889.280	6	TCATCTGCACCGATGAGTTC (20)	700	51	389
			CTCCATAGCGTTGACGAACA (20)

*Gene fragments included in the 7 loci MLST scheme;

Gene fragments included in the reduced 4 loci MLST scheme;

Gene ID: GenBank access number for the complete gene in the CL-Brener strain;

Amplicon size refers to the sequence size of the gene fragment including the primers regions;

5′ starting position: indicates the position where the analyzed sequence starts, counting from the first base of the amplicon;

Fragment Length refers to the sequence length used for the analyses (the analyzed fragments do not include the primer regions).

*Gene fragments included in the 7 loci MLST scheme; Gene fragments included in the reduced 4 loci MLST scheme; Gene ID: GenBank access number for the complete gene in the CL-Brener strain; Amplicon size refers to the sequence size of the gene fragment including the primers regions; 5′ starting position: indicates the position where the analyzed sequence starts, counting from the first base of the amplicon; Fragment Length refers to the sequence length used for the analyses (the analyzed fragments do not include the primer regions).

Molecular methods

PCRs were performed in 50 µl reaction volumes containing 100 ng of DNA, 0.2 µM of each primer, 1 U of goTaq DNA polymerase (Promega), 10 µl of buffer (supplied with the GoTaq polymerase) and a 50 µM concentration of each deoxynucleoside triphosphate (Promega). Amplification conditions for all targets were: 5 min at 94°C followed by 35 cycles of 94°C for 1 min; 55°C 1 min, and 72°C for 1 min, with a final extension at 72°C for 5 min. Amplified fragments were purified (QIAquick, Qiagen) and sequenced in both directions (ABI PRISM 310 Genetic Analyzer or ABI PRISM 377 DNA Sequencers, Applied Biosystems) using standard protocols. Primers used for sequencing were identical to those used in PCR amplifications. In order to assess reproducibility, each PCR amplification was performed multiple times and associated sequencing was repeated at least twice.

Data analysis

MLST data were analysed with MLSTest software (http://ipe.unsa.edu.ar/software) [52] with the objective of identifying the most resolutive and minimum number of targets for unequivocal DTU assignment and potential fine scale characterisation. MLSTest contains a suite of MLST data specific analytical tools. Briefly, single nucleotide polymorphisms (SNPs) were identified in all loci in MLSTest alignment viewer. Typing efficiency (TE) was calculated using the same software. TE for a determined locus is calculated as the number of identified genotypes divided by the number of polymorphic sites in this locus. Additionally, discriminatory power, defined as the probability that two strains are distinguished when chosen at random from a population of unrelated strains [53], was determined for each target (Table 3).

Table 3

T. cruzi MLST targets.

Gene fragment	No. of genotypes	No. of polymorphic sites	Typing efficiency1	DP
GPI * †	9	18	0.500	0.889
HMCOAR * †	15	20	0.750	0.954
RHO1 * †	13	23	0.565	0.914
Tc MPX * †	11	12	0.917	0.905
LAP *	13	16	0.812	0.942
SODB *	12	9	1.333	0.914
RB19 *	21	26	0.808	0.985
GPX	12	16	0.750	0.908
PDH	11	15	0.733	0.920
GTP	10	18	0.556	0.905
SODA	10	10	1.000	0.880
STPP2	5	8	0.625	0.585
Met-II	19	40	0.475	0.978

DP: Discriminatory Power according to [53],

Number of genotypes per polymorphic site,

*Included in the seven loci scheme,

Included in the four loci scheme.

DP: Discriminatory Power according to [53], Number of genotypes per polymorphic site, *Included in the seven loci scheme, Included in the four loci scheme. Sequence data were concatenated and Neighbour Joining phylogenetic trees were generated by using uncorrected p-distances. Heterozygous sites were handled in the analyses using two different methods. First, a SNP duplication method described by Yeo et al. and Tavanti et al. [51], [54] was implemented. Briefly, the SNP duplication method involves the elimination of monomorphic sites and duplication of polymorphisms in order to “resolve” the heterozygous sites. As an example, a homozygous variable locus scored as C (cytosine) will be modified by CC; while a heterozygous locus, for example Y (C or T, in accordance with IUPAC nomenclature), will be scored as CT. Alternatively, heterozygous SNPs were considered as average states. In more detail, the genetic distance between T and Y (heterozygosity composed of T and C) is considered as the mean distance between the T and the possible resolutions of Y (distance T-T = 0 and distance T-C = 1, average distance = 0.5, see [53] and MLSTest 1.0 manual at http://www.ipe.unsa.edu.ar/software for further details). Statistical support was evaluated by 1000 bootstrap replications. Overall phylogenetic incongruence among loci (by comparison with the concatenated topology) was assessed by the Incongruence Length Difference Test using the BIO-Neighbour Joining method (BIONJ-ILD, [55]) and evaluated by a permutation test with 1,000 replications. Briefly, the ILD evaluates whether the observed incongruence among fragments is higher than that expected by random unstructured homoplasy across the different fragments. A statistical significant ILD p value indicates that many sites, in at least one fragment, support a phylogeny that is contradicted by other fragments. In order to localize significant incongruent branches in concatenated data we used the Neighbour Joining based Localized Incongruence Length Difference (NJ-LILD) test available in MLSTest. NJ-LILD is a variant of the ILD test that allows localizing incongruence at branch level. All combinations from 2 to 12 fragments were analysed using the scheme optimisation algorithm in MLSTest which identifies the combination of loci producing the maximum number of diploid sequence types (DSTs). Three main sequential criteria were applied to select the optimum combination of loci: firstly, monophyly of DTUs and lineage assignment; secondly, robust bootstrap values for the six major DTUs (1000 replications); and thirdly detection of genetic diversity at the intra-DTU level.

Results

PCR amplification and sequencing

All 13 gene fragments were successfully amplified using identical PCR reaction conditions (see methods) which generated discrete PCR fragments. PCR amplifications of the 13 targets were applied to an extended panel of 90 isolates obtaining more than 98% of positive PCR and amplifications produced strong amplicons and an absence of non-specific products (data not shown). We obtained amplicons of the expected length for all the assayed targets and for all the examined strains. Amplification for various DNA template concentrations was assayed via serial dilution. No difference in PCR amplifications were obtained when DNA concentrations from 20 to 100 ng were used. A total of 5,121 bp of sequence data were analysed for each strain (Table 2). There were no gaps in sequences. The number of polymorphic sites (Table 3) for each of the different fragments varied from 8 (STPP2) to 40 (Met-II). STTP2 showed the lowest discriminatory power (describing just 5 different genotypes in the dataset). Rb19 was the fragment with the highest discriminatory power identifying 21 distinct genotypes in the dataset.

Optimized scheme for MLST

Initially, Neighbor Joining trees were generated from concatenated sequences across the 13 prescreened loci which identified four monophyletic DTUs with robust bootstrap support (TcI, TcII, TcIII, TcIV, bootstrap >98%). TcVI was also monophyletic but with a relatively low support (Figure 1). Additionally, TcV was paraphyletic with Mncl2 as an outlier. The concatenated 13 fragments differentiated all 25 reference strains in terms of DSTs. We observed that bootstrap values were slightly different between the two methods (SNP duplication and average states) as they manage heterozygous sites differently. Values were higher for the SNP duplication method in most branches (Figure 1, branch values highlighted in blue) as a consequence of base duplication, which modifies the alignment and increases the informative sites used for bootstrapping. To avoid the potential for methodologically elevated bootstraps, the average states method was implemented for further analyses. From the selected 13 loci, all possible combinations of 2 to 12 loci were analysed (8,177 combinations) by implementing the MLSTest scheme optimisation algorithm. One combination of 7 loci was the best according to the proposed criteria. This combination consisted of Rb19, TcMPX, HMCOAR, RHO1, GPI, SODB and LAP discriminating all 25 strains as DSTs, and importantly categorising all separate DTUs as a monophyletic group. DTUs were also well-supported by associated bootstraps values (TcI,100; TcII, 100; TcIII, 99.8; TcIV, 88.2; TcV, 88.7; TcVI, 99.6) as illustrated in Figure 2. Combinations with higher number of loci (from 8 to 12) did not significantly increased bootstrap values of TcIV and TcV.

Figure 1

Neighbor Joining tree based on the concatenation of 13 MLST fragments.

Different DTUs are represented by vertical bars. Branch values represent bootstrap values (1000 replications), different bootstrap values indicate the method of handling heterozygous sites: SNP duplication method (first value) and average states (second values). Branch supports highlighted in blue shows branches where support for SNP duplication method was higher than the average states method. The outlier TcV is highlighted in red. Scale bar at the bottom left represents uncorrected p-distances.

Figure 2

Neighbor Joining tree based on the concatenation of 7 selected MLST fragments: Rb19, TcMPX, HMCOAR, RHO1, GPI, SODB and LAP.

Neighbor Joining tree based on the concatenation of 13 MLST fragments.

Neighbor Joining tree based on the concatenation of 7 selected MLST fragments: Rb19, TcMPX, HMCOAR, RHO1, GPI, SODB and LAP.

Different DTUs are represented by vertical bars. Branch values represent bootstrap values (1000 replications). Heterozygous sites were considered as average states (see methods). Scale bar at the bottom left represents uncorrected p-distances. We assessed whether the outlier for TcV (Mn cl2) and the low bootstrap observed for TcVI (applied to all 13 fragments) was due to incongruence among fragments. The thirteen fragment dataset was significantly incongruent (BIONJ-ILD p-value<0.001) for at least one partition which was corroborated using NJ-LILD with a permutation test and 500 replications. Significant incongruence (p-value<0.05 after Bonferroni correction) was detected in the TcV and TcVI nodes. Incongruence was likely due to strains within DTUs TcV and TcVI demonstrating apparent loss of heterozygosis (LOH) in the Met-II fragment. Excluding Met-II, the p-value for ILD was not significant (BIONJ-ILD p-value = 0.33), and the bootstrap values for TcV and TcVI exceeded 85%, furthermore tree topology was congruent with expected DTU assignment.

Reduced scheme for DTU assignment

Attempts were made to reduce the number of fragments required for DTU assignment while maintaining DST identification. All combinations of 3 and 4 fragments (1,001 combinations) from the panel of 13 fragments were analysed as described above. A reduced MLST panel incorporating TcMPX, HMCOAR, RHO1 and GPI (four loci) produced the highest bootstrap values for DTU assignment across the DTUs, TcI (99.9), TcII (100), TcIII (99.5), TcIV (86.7), TcV (100) and TcVI (96.8) (Figure 3), and discriminated 19 of 25 DSTs. Other combinations showed higher discriminatory power but presented with lower bootstrap values (data not shown). The TcMPX locus exhibits an apparent loss of heterozygosity (LOH) in the hybrid DTU TcV, retaining the TcII like allele but not the TcIII allele. Therefore DTU assignment using TcMPX alone would not assign a TcV isolate correctly. However concatenation of TcMPX with HMCOAR, RHO1 and GPI allow distinguishing TcV from TcII.

Figure 3

Neighbor Joining tree based on the concatenation of 4 selected MLST fragments (TcMPX, HMCOAR, RHO1, GPI) for DTU assignment.

Different DTUs are represented by vertical bars. Branch values represent bootstrap values (1000 replications). Heterozygous sites were handled using the average states method. Scale bar at the bottom left represents uncorrected p-distances.

Neighbor Joining tree based on the concatenation of 4 selected MLST fragments (TcMPX, HMCOAR, RHO1, GPI) for DTU assignment.

Inter and intra DTU phylogenies

Topologies obtained for the 7 and 4 loci combinations (Figures 2 and 3, respectively) were similar to the 13 loci scheme, showing consistently the two major groups (TcI-TcIII-TcIV and TcII-TcV-TcVI) supported by high bootstrap values, even when trees were rooted using TcMB7 (Figure 1). The primary difference between the 13 target concatenated phylogenies and the trees obtained for the 7 and 4 targets was that for the 13 concatenated targets TcV was paraphyletic, showing the Mncl2 strain as an outlier. Regarding inter-DTU relationships, the analysis of the concatenated 13 fragments divided DTUs into two major clusters: one composed by TcI, TcIII and TcIV, with a bootstrap value of 100%; while the remaining group containing TcII, TcV and TcVI was supported by lower bootstrap values (<70%), possibly due to presence of the two hybrid DTUs (TcV and TcVI) (Figure 1). Within clusters, internal topologies were supported with relatively high but variable bootstrap values with 4, 7 and 13 loci combinations and generally consistent intralineage topologies (Figures 1, Figure 2, Figure 3), although the panel of 25 reference strains would need to be expanded further for assessment of fine scale intralineage associations.

Discussion

Thirteen gene fragments were assessed in an optimised MLST scheme which is a combination of targets from two recently separately proposed schemes [50], [51]. Here we evaluated the optimal combination of loci based on three main sequential criteria: first, assignment to the expected DTU; second, to attain robust bootstrap values for the six major DTUs, and third to detect intra-DTU diversity. For the first time we propose an optimised MLST scheme, validated against a panel representing all known lineages, for characterisation of T. cruzi isolates. However, it should be emphasized that this MLST scheme is proposed as a typing method for T. cruzi isolates but not as a typing method to be used directly on biological samples as blood, tissues or Triatomine feces, for which more sensitive and simpler methods are needed. Moreover, we have performed assays with the purpose of determining the limit of detection of each gene fragment on blood and triatomines feces (data not shown) and we found that none of these targets are suitable for detecting T. cruzi in the normal concentration found in natural biological samples. As a result of our data analyses, we obtained one combination of 7 loci and one combination of only 4 targets which most closely adhered to the selection criteria described above. It is worth noting that the three used criteria for selecting optimum combination of targets are sequential; it means that there is a hierarchical order of these criteria. In first place, we look for obtaining monophyly for the six DTUs and accurate lineage assignment of each examined strain. In a second place, we look for obtaining robust bootstrap values for each of the six major DTUs. Finally, we expect detecting genetic diversity at the intra-DTU level. In this context, due to the hierarchical order of the criteria of selection of loci, the selected combinations will optimise the number of DSTs but subordinated to the two previous criteria. Theoretically, using these criteria, we could obtain a combination of loci that does not give the maximum number of DST for a determined DTU, because our algorithm previously prioritized obtaining monophyly and strong bootstrap values for the six DTUs. This was the case for the selected 4-loci scheme (which differentiated 19 from 25 strains). In spite of this, the selected 7-loci combination that we propose, allow us to differentiate the 25 examined strains, i.e. the maximum possible number of DSTs. The results illustrate that MLST is a highly discriminatory strain-typing technique. From these data we suggest that the 7 locus scheme provides scope for both lineage assignment and diversity studies, generating robust bootstrap values for distance based phylogenies and that a reduced panel of only four targets is sufficient for assignment to DTU level. For population genetics scale analyses and detailed epidemiological studies a comprehensive larger panel of T. cruzi isolates should be assessed by sequencing the proposed targets. The phylogenetic associations among DTUs TcI, TcII, TcIII and Tc IV are debatable. Split affinities and incongruence have been observed in nuclear phylogenies [7], [8], [51], [56]. One interpretation of phylogenetic incongruence is genetic recombination, although due to the highly plastic nature of the T. cruzi genome other causes are also possible. Mutation rates and gene conversion may create distinct levels of sequence diversity [57]. Here, concatenated phylogenies showed a partition into two main clusters for all gene combinations tested, the first consisting of TcI, TcIII and TcIV (bootstrap value = 100%); and the second composed of TcII, TcV and TcVI (bootstrap value <70%). The presence of the two known hybrid lineages (TcV and TcVI) generated artifactual phylogenetic structuring and excluding these representatives revealed clustering of DTUs TcI, TcIII and TcIV, indicating that TcI has a closer affinity to TcIII than to TcIV. TcII is the most genetically distant group which is in agreement with previous findings [9], [10], [51]. In addition, it would be interesting to analyze in the future the new lineage described as TcBat [58] using the MLST scheme proposed here, since it could shed light on the phylogenetical position of this interesting lineage. LOH observed in Met-II and TcMPX gene fragments affecting the hybrid lineages TcV and TcVI has potentially significant consequences for MLST and lineage assignment [51]. Isolates affected retain the TcII like allele and would be misassigned in single locus characterisation. For example, hybrid isolates TcV would be assigned to TcII based on TcMPX sequencing due to apparent LOH. Despite this LOH the TcMPX locus was included in the 4 target scheme to increase bootstrap support in differentiating between TcV from TcVI. Although MLST has been successfully applied to other diploid organisms including Candida albicans, the potential for heterozygous alleles complicates typing schemes. In the present work, two methods to handle heterozygous sites, SNPs duplication and average states algorithms, produced broadly similar results with SNP duplication producing marginally higher bootstraps due to the physical duplication of informative sites. Here we decided to implement the average states methodology to derive genetic distances and phylogenies. Both approaches can be found in the software MLSTest [52] producing results that enable resolution at the DTU level and an associated DP of 1 for the panel tested. A significant advantage of MLST based analysis over sequential PCR based gels is that once generated, sequences can be applied to a range of complementary downstream analyses. For example, the resolution of haplotypes for recombination analysis and investigation of more detailed evolutionary associations can be applied to population sized studies. At present, whole genome sequencing applied to large numbers of isolates is not feasible and microsatellite analysis is often difficult to reproduce precisely across laboratories, unlike MLST which has proven reproducibility both within and between laboratories [59]. However, microsatellites could be more convenient for population genetics studies at a microevolutionary level, due to their high resolution power. A further consideration in the analysis of diploid sequences is differentiating heterozygosity from copy number diversity. Ideally, we should prefer single copy genes for MLST schemes in order to avoid comparisons among paralogous. We performed in silico analyses in order to estimate the copy number of the selected targets on the genomic data of CL-Brener (TcVI) and Sylvio X10 (TcI) (http://tritrypdb.org/tritrypdb/). For these analyses, we used as query the primer sequences as well as the complete fragment sequences. These searches displayed just single matches in all cases. Consequently, we propose that all the examined MLST fragments may be considered as single copy genes, at least for typing and clustering. One of the most important aspects in any MLST scheme is to provide targets that consistently produce PCR amplicons requiring minimal cleanup and are suitable for sequencing. Although in the current protocol, we recommend purifying PCR products with a suitable commercial kit (Quiagen), in most cases, this was not necessary and sequencing was performed directly from the PCR product. The exception was TcGPXII, and very occasionally SODA produced nonspecific products, neither of which are included in final recommended panels. Although the two previously published MLST [50], [51] schemes showed promise in identifying diversity, some of the gene targets were not amenable for routine use. For example, LYT1 was discarded due to unreliable amplification and DHFR-TS due to the need for internal primers. Therefore further optimisation performed here was necessary for practical use. An important criterion for choosing targets was identifying those that used the same primers for both PCR amplification and sequencing to maintain simplicity and reduce costs. Taken together, we propose a MLST scheme validated against a panel representing all of the known lineages of T. cruzi. We propose that a 7 loci MLST scheme could provide the basis for robust DTU assignment and strain diversity studies of new isolates and a reduced 4 loci scheme for lineage assignment. Importantly, the sequence data generated can be utilised for a wide range of downstream analyses, including the resolution of haplotypes for recombination analysis, population genetics analyses, and other statistical approaches to the phyloepidemiological study of T. cruzi. Finally, we propose that the seven-fragment MLST scheme could be used as a gold standard for T. cruzi typing, against which other typing approaches, particularly single locus approaches or systematic PCR assays based on amplicon size, could be compared.

59 in total

1. Towards multilocus sequence typing of the Leishmania donovani complex: resolving genotypes and haplotypes for five polymorphic metabolic enzymes (ASAT, GPI, NH1, NH2, PGD).

Authors: Isabel L Mauricio; Matthew Yeo; Mehdi Baghaei; Daniela Doto; Francine Pratlong; Eva Zemanova; Jean-Pierre Dedet; Julius Lukes; Michael A Miles
Journal: Int J Parasitol Date: 2006-04-25 Impact factor: 3.981

Review 2. Multilocus sequence typing of pathogenic Candida species.

Authors: Frank C Odds; Mette D Jacobsen
Journal: Eukaryot Cell Date: 2008-05-02

3. Development of a new MLST scheme for differentiation of Fusarium solani Species Complex (FSSC) isolates.

Authors: Anne Debourgogne; Cécile Gueidan; Christophe Hennequin; Nelly Contet-Audonneau; Sybren de Hoog; Marie Machouart
Journal: J Microbiol Methods Date: 2010-08-02 Impact factor: 2.363

4. Candidate targets for Multilocus Sequence Typing of Trypanosoma cruzi: validation using parasite stocks from the Chaco Region and a set of reference strains.

Authors: Juan J Lauthier; Nicolás Tomasini; Christian Barnabé; María M Monje Rumi; Anahí M Alberti D'Amato; Paula G Ragone; Matthew Yeo; Michael D Lewis; Martin S Llewellyn; Miguel A Basombrío; Michael A Miles; Michel Tibayrenc; Patricio Diosque
Journal: Infect Genet Evol Date: 2011-12-22 Impact factor: 3.342

5. Isozymic heterogeneity of Trypanosoma cruzi in the first autochthonous patients with Chagas' disease in Amazonian Brazil.

Authors: M A Miles; A Souza; M Povoa; J J Shaw; R Lainson; P J Toye
Journal: Nature Date: 1978-04-27 Impact factor: 49.962

6. Population structure and properties of Candida albicans, as determined by multilocus sequence typing.

Authors: Arianna Tavanti; Amanda D Davidson; Mark J Fordyce; Neil A R Gow; Martin C J Maiden; Frank C Odds
Journal: J Clin Microbiol Date: 2005-11 Impact factor: 5.948

7. Host specificity of ribosomal DNA variation in sylvatic Trypanosoma cruzi from North America.

Authors: C G Clark; O J Pung
Journal: Mol Biochem Parasitol Date: 1994-07 Impact factor: 1.759

8. Trypanosoma cruzi discrete typing units (DTUs): microsatellite loci and population genetics of DTUs TcV and TcI in Bolivia and Peru.

Authors: Christian Barnabé; Thierry De Meeûs; François Noireau; Marie-France Bosseno; Eric Marcelo Monje; François Renaud; Simone Frédérique Brenière
Journal: Infect Genet Evol Date: 2011-07-27 Impact factor: 3.342

9. Sensitive detection and strain classification of Trypanosoma cruzi by amplification of a ribosomal RNA sequence.

Authors: R P Souto; B Zingales
Journal: Mol Biochem Parasitol Date: 1993-11 Impact factor: 1.759

10. Genome-scale multilocus microsatellite typing of Trypanosoma cruzi discrete typing unit I reveals phylogeographic structure and specific genotypes linked to human infection.

Authors: Martin S Llewellyn; Michael A Miles; Hernan J Carrasco; Michael D Lewis; Matthew Yeo; Jorge Vargas; Faustino Torrico; Patricio Diosque; Vera Valente; Sebastiao A Valente; Michael W Gaunt
Journal: PLoS Pathog Date: 2009-05-01 Impact factor: 6.823

12 in total

Review 1. Chagas Disease in the United States: a Public Health Approach.

Authors: Caryn Bern; Louisa A Messenger; Jeffrey D Whitman; James H Maguire
Journal: Clin Microbiol Rev Date: 2019-11-27 Impact factor: 26.132

Review 2. Evolution of Trypanosoma cruzi: clarifying hybridisations, mitochondrial introgressions and phylogenetic relationships between major lineages.

Authors: Nicolás Tomasini; Patricio Diosque
Journal: Mem Inst Oswaldo Cruz Date: 2015-03-24 Impact factor: 2.743

Review 3. Between a bug and a hard place: Trypanosoma cruzi genetic diversity and the clinical outcomes of Chagas disease.

Authors: Louisa A Messenger; Michael A Miles; Caryn Bern
Journal: Expert Rev Anti Infect Ther Date: 2015-08 Impact factor: 5.091

Review 4. Introgression of the Kinetoplast DNA: An Unusual Evolutionary Journey in Trypanosoma cruzi.

Authors: Nicolás Tomasini
Journal: Curr Genomics Date: 2018-02 Impact factor: 2.236

5. Dissecting the phyloepidemiology of Trypanosoma cruzi I (TcI) in Brazil by the use of high resolution genetic markers.

Authors: Fabiola Roman; Samanta das Chagas Xavier; Louisa A Messenger; Márcio G Pavan; Michael A Miles; Ana María Jansen; Matthew Yeo
Journal: PLoS Negl Trop Dis Date: 2018-05-21

6. Cytochrome c oxidase subunit 1 gene as a DNA barcode for discriminating Trypanosoma cruzi DTUs and closely related species.

Authors: Marina Silva Rodrigues; Karina Alessandra Morelli; Ana Maria Jansen
Journal: Parasit Vectors Date: 2017-10-16 Impact factor: 3.876

7. Elucidating diversity in the class composition of the minicircle hypervariable region of Trypanosoma cruzi: New perspectives on typing and kDNA inheritance.

Authors: Fanny Rusman; Nicolás Tomasini; Noelia-Floridia Yapur; Andrea F Puebla; Paula G Ragone; Patricio Diosque
Journal: PLoS Negl Trop Dis Date: 2019-06-27