Literature DB >> 17967176

Based Upon Repeat Pattern (BURP): an algorithm to characterize the long-term evolution of Staphylococcus aureus populations based on spa polymorphisms.

Alexander Mellmann1, Thomas Weniger, Christoph Berssenbrügge, Jörg Rothgänger, Michael Sammeth, Jens Stoye, Dag Harmsen.   

Abstract

BACKGROUND: For typing of Staphylococcus aureus, DNA sequencing of the repeat region of the protein A (spa) gene is a well established discriminatory method for outbreak investigations. Recently, it was hypothesized that this region also reflects long-term epidemiology. However, no automated and objective algorithm existed to cluster different repeat regions. In this study, the Based Upon Repeat Pattern (BURP) implementation that is a heuristic variant of the newly described EDSI algorithm was investigated to infer the clonal relatedness of different spa types. For calibration of BURP parameters, 400 representative S. aureus strains with different spa types were characterized by MLST and clustered using eBURST as "gold standard" for their phylogeny. Typing concordance analysis between eBURST and BURP clustering (spa-CC) were performed using all possible BURP parameters to determine their optimal combination. BURP was subsequently evaluated with a strain collection reflecting the breadth of diversity of S. aureus (JCM 2002; 40:4544).
RESULTS: In total, the 400 strains exhibited 122 different MLST types. eBURST grouped them into 23 clonal complexes (CC; 354 isolates) and 33 singletons (46 isolates). BURP clustering of spa types using all possible parameter combinations and subsequent comparison with eBURST CCs resulted in concordances ranging from 8.2 to 96.2%. However, 96.2% concordance was reached only if spa types shorter than 8 repeats were excluded, which resulted in 37% excluded spa types. Therefore, the optimal combination of the BURP parameters was "exclude spa types shorter than 5 repeats" and "cluster spa types into spa-CC if cost distances are less than 4" exhibiting 95.3% concordance to eBURST. This algorithm identified 24 spa-CCs, 40 singletons, and excluded only 7.8% spa types. Analyzing the natural population with these parameters, the comparison of whole-genome micro-array groupings (at the level of 0.31 Pearson correlation index) and spa-CCs gave a concordance of 87.1%; BURP spa-CCs vs. manually grouped spa types resulted in 95.7% concordance.
CONCLUSION: BURP is the first automated and objective tool to infer clonal relatedness from spa repeat regions. It is able to extract an evolutionary signal rather congruent to MLST and micro-array data.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17967176      PMCID: PMC2148047          DOI: 10.1186/1471-2180-7-98

Source DB:  PubMed          Journal:  BMC Microbiol        ISSN: 1471-2180            Impact factor:   3.605


Background

Staphylococcus aureus, a human commensal living on the skin and mucosa, can cause a broad range of infections including endocarditis, septicemia, skin infections, soft tissue infections, and osteomyelitis. Moreover, S. aureus is the leading cause of nosocomial infections [1]. The application of several new genotypic typing methods gave many new insights into the epidemiology and population structure of S. aureus [2]. Recently, Koreen et al. investigated a collection of 36 S. aureus isolates (methicillin resistant and methicillin sensible S. aureus, MRSA and MSSA, respectively), which was recovered from 10 countries on four continents over a period of four decades as a representative of the breadth of diversity within S. aureus [3]. They used whole-genome micro-array analysis (comprising approximately 2,800 open reading frames) as typing reference to evaluate the capability of several typing techniques, among them partial S. aureus protein A (spa) gene sequencing. The spa repeat region consists of a variable number of 21–27 bp long repeats (VNTRs) varying in composition that result in different spa types. Previously it was shown that spa typing is fast, discriminatory, and very reproducible [4,5]. It was hypothesized by Koreen and colleagues that by manual grouping of similar spa types this region contains evolutionary signals nearly comparable to whole-genome micro-array data [3]. Until recently, however, no automated and objective algorithm existed to cluster different repeat regions. The Based Upon Repeat Pattern (BURP) implementation that is a heuristic variant of the newly described EDSI algorithm [6], was investigated in this study to infer the clonal relatedness of different spa types. We first calibrated the BURP parameters using multilocus sequence typing (MLST) data from a representative strain collection as "gold standard" and then evaluated BURP using the Koreen et al. dataset.

Methods

S. aureus strains (MRSA and MSSA) were used from our strain collection comprising 400 of the initial and most frequently to the SpaServer reported spa types [7]. From these strains, MLST sequence types (ST) were determined as previously [8]. STs that showed at least six of seven identical alleles were grouped into clonal complexes (CC) using eBURST [9]. BURP – as implemented in the StaphType software v. 1.5 (Ridom GmbH, Würzburg, Germany) – was used to cluster (spa-CC) spa types [10]. Repeat-duplication and -excision in addition to substitution and base-insertion and -deletion events were taken into account when the relatedness of different spa types was calculated. BURP offers two user-defined parameters that influence clustering: exclusion of spa types that are shorter than "x" repeats and the maximum number of costs "y" for clustering spa types into the same group. Short spa types can be excluded from further analysis because their information content is limited and no reliable evolutionary history can be inferred. The costs account for the "steps" of evolution between two different spa types, whereas the algorithm tries to minimize these steps ("parsimony assumption"). To find out the optimal combination of these two parameters, clustering of all possible combinations of both parameters (values: 1 to 10) was performed. A prerequisite was that the number of excluded spa types should be as low as possible and not exceed 10% of all investigated spa types. Subsequently, the typing concordance [11] between BURP and eBURST groupings were determined to elucidate the best parameter combination with the highest concordance on the one side and the lowest number of excluded spa types on the other. BURP calibrated in this manner was finally used to cluster the strains from the study of Koreen et al. [3].

Results and discussion

In total, the 400 investigated strains exhibited 122 different STs. The eBURST algorithm clustered the STs into 23 CCs (354 isolates) and 33 singletons (46 isolates). BURP clustering of spa types using all possible parameter combinations and subsequent comparison with eBURST CCs resulted in concordances ranging from 8.2 to 96.2%. These concordances are illustrated in Figure 1 using the Visual-XSel 9.0 software (CRGraph, München, Germany). To determine the optimal combination between the BURP parameters, a graph showing the dependence of the concordance from the minimal repeat length of included spa types (vice versa the percentage of excluded spa types) for the different costs were drawn by MS Excel XP. The overall highest concordance (96.2%) lay on the cost value 4 curve (Figure 2). Analyzing this curve, the closest integer to the first inflection point – representing the first local maximum – was chosen as the optimal combination of the BURP parameters with "exclude spa types that are shorter than 5 repeats" and "spa types are clustered if costs are less or equal than 4". In this way a concordance of 95.3% could be achieved (Figure 2). Using these parameters, BURP clustered the 400 spa types into 24 spa-CCs and 40 singletons. Only 31 (7.8%) spa types were excluded by using these parameters. In contrast, analysis of ungrouped spa types vs. eBURST CCs resulted in 92.8% concordance, only. A population snapshot of the 369 included strains after BURP grouping is displayed in Figure 3. It shows clusters of linked spa types in spa-CCs, linked doublets, and individual unlinked spa types. In Table 1, exemplarily the spa-CC004, its spa types, corresponding STs, and CCs is shown. In general, a high concordance between BURP and eBURST clustering can be observed. Of the 50 spa types that were clustered in spa-CC004, only three spa types were grouped into another CC and another three were judged as singletons by MLST.
Figure 1

Concordance analysis of eBURST and BURP clustering in dependence of all possible BURP parameters. A surface curve displaying the dependence of concordance (in %) between eBURST MLST CCs and BURP spa-CCs applying all possible combinations of the BURP parameters "exclude spa types that are shorter than x repeats" and "spa types are clustered if costs are less or equal than y".

Figure 2

High range of concordance between eBURST and BURP for optimal BURP calibration. Graph showing curves for cost integers in the high concordance range. Curves labeled "Costs: 1 to 10" represent different cost values. For the curve with the overall highest concordance (Costs: 4) the first inflection point is marked (arrow) and corresponds to the first local optimum giving a good balance between concordance and percentage of excluded spa types.

Figure 3

Population snapshot of the 400 . Population snapshot of the 400 S. aureus strains after grouping with the calibrated BURP ("exclude spa types that are shorter than 5 repeats" and "spa types are clustered if costs are less or equal than 4", 31 spa types were excluded). Clusters of linked isolates correspond to spa-CCs. Whereas eBURST uses the number of relatives (single locus variants, SLVs) to define founders and subfounders of groups, BURP sums up costs to define a founder-score for each spa type in a cluster. The spa type with the highest founder-score is defined founder of the cluster (blue color). Subfounders are the spa types with the second highest founder-score and are labeled in yellow. If two or more spa types exhibit the same highest founder-score, they are all colored in blue. For clarity, only the spa-CCs are labeled. Note that the spacing between linked spa types and between unlinked spa types and spa-CCs provides no information concerning the genetic distance between them.

Table 1

Comparison of BURP and eBURST clustering results

spa typeMLST STMLST CC
t004, t015, t028, t029, t031, t033, t038, t040, t043, t049, t050, t061, t065, t069, t073, t077, t080, t095, t102, t116, t123, t124, t130, t141, t142, t157, t161, t204, t230, t247, t266, t277, t330, t331, t333, t340, t350, t361, t370, t371, t4244545
t1805345
t2205445
t29527845

t2091099
t133254239
t412846395
t302625singletona
t397842singleton
t3831008bsingleton

spa types, their corresponding MLST sequence types (ST), and clonal complexes (CC) of spa-CC004 are shown. ano clonal complex was assigned for these singletons by eBURST analysis, bthis ST is preliminary named ST1008 and has the allelic profile 6, 5, 6, 6, 7, 17, 19.

Concordance analysis of eBURST and BURP clustering in dependence of all possible BURP parameters. A surface curve displaying the dependence of concordance (in %) between eBURST MLST CCs and BURP spa-CCs applying all possible combinations of the BURP parameters "exclude spa types that are shorter than x repeats" and "spa types are clustered if costs are less or equal than y". High range of concordance between eBURST and BURP for optimal BURP calibration. Graph showing curves for cost integers in the high concordance range. Curves labeled "Costs: 1 to 10" represent different cost values. For the curve with the overall highest concordance (Costs: 4) the first inflection point is marked (arrow) and corresponds to the first local optimum giving a good balance between concordance and percentage of excluded spa types. Population snapshot of the 400 . Population snapshot of the 400 S. aureus strains after grouping with the calibrated BURP ("exclude spa types that are shorter than 5 repeats" and "spa types are clustered if costs are less or equal than 4", 31 spa types were excluded). Clusters of linked isolates correspond to spa-CCs. Whereas eBURST uses the number of relatives (single locus variants, SLVs) to define founders and subfounders of groups, BURP sums up costs to define a founder-score for each spa type in a cluster. The spa type with the highest founder-score is defined founder of the cluster (blue color). Subfounders are the spa types with the second highest founder-score and are labeled in yellow. If two or more spa types exhibit the same highest founder-score, they are all colored in blue. For clarity, only the spa-CCs are labeled. Note that the spacing between linked spa types and between unlinked spa types and spa-CCs provides no information concerning the genetic distance between them. Comparison of BURP and eBURST clustering results spa types, their corresponding MLST sequence types (ST), and clonal complexes (CC) of spa-CC004 are shown. ano clonal complex was assigned for these singletons by eBURST analysis, bthis ST is preliminary named ST1008 and has the allelic profile 6, 5, 6, 6, 7, 17, 19. Comparing whole-genome micro-array groupings (at the level of 0.31 Pearson correlation index) and spa-CCs of the 36 strains from the study of Koreen et al. using the calibrated BURP gave a concordance of 87.1% – that is in the same range as reported. BURP spa-CCs vs. manually grouped spa types resulted in 95.7% concordance. The underlying alignment model of BURP takes repeat-duplication and -excision into account [6] – in contrast to widely-used multiple alignment strategies like ClustalW [12]. The proposed molecular mechanism of the evolution of such repeat regions is slipped-strand mispairing (SSM) during DNA duplication [13]. The presence of those evolutionary events within the spa repeat region was already detected in vivo, when sequential S. aureus isolates from long-term pulmonary S. aureus colonization/infection of cystic fibrosis patients were spa typed [14]. The high concordance of BURP spa-CCs in comparison to eBURST CCs using a diverse strain collection demonstrated that spa indeed contains long-term evolutionary signals. Recent comparisons between spa-CCs and PFGE clustering corroborated these findings [15,16]. In future, the integration of BURP into the already established early-warning system for MRSA-outbreaks based on spa typing will help to detect clonal diversification during extended outbreaks [17]. There are some limitations using spa-CCs for long-term analysis. First, the strains must be spa-typeable. Having typed more than 8,000 isolates, however, very few isolates (approximately 0.1%) were not typeable – probably due to mutations within the primer binding regions. Second, BURP analyses are limited to spa types that pass the parameter of a certain number of repeats. However, when analyzing the SpaServer content (accessed at 12th September 2007) comprising 38,978 isolates with 2964 different spa types, only 204 (6.88%) of all spa types and 881 (2,26%) of all submitted isolates are effected, respectively. Finally, in very few instances, discrepancies can occur between spa and other typing methods as observed in this study and in two recent other publications [15,16]. These discrepancies are most probably due to recombinational events. Large chromosomal replacements that give rise to such typing incongruences have been experimentally documented for two STs previously [18].

Conclusion

In summary, BURP is the first automated and objective tool to infer clonal relatedness from spa repeat regions. It is able to extract an evolutionary signal rather congruent to MLST and micro-array data.

Abbreviations

BURP – Based Upon Repeat Pattern; CC – clonal complex; eBURST – electronic Based Upon Related Sequence Types; EDSI – excision and duplication of repeats, and substitution and indels of bases; MLST – multilocus sequence typing; MRSA – methicillin resistant S. aureus; MSSA – methicillin sensible S. aureus; PFGE – pulsed-field gel electrophoresis; spaS. aureus protein A encoding gene; ST – sequence type; VNTR – variable number tandem repeats

Competing interests

J. Rothgänger and D. Harmsen have declared a potential conflict of interest. J. Rothgänger and D. Harmsen are the developers of the Ridom StaphType software mentioned in the manuscript. The software is distributed and sold by the company Ridom GmbH that is partially owned by them. All other authors have declared that no competing interests exist.

Authors' contributions

The project was coordinated by DH. AM and CB performed the laboratory work and data analysis. MS and JS developed the EDSI algorithm. TW and JR implemented BURP. AM, TW, and DH wrote the main part of the paper. All other authors gave useful comment on the analysis of data and text of the manuscript. All authors have read and approved the final version of the manuscript.
  17 in total

1.  National Nosocomial Infections Surveillance (NNIS) System Report, data summary from January 1992 through June 2003, issued August 2003.

Authors: 
Journal:  Am J Infect Control       Date:  2003-12       Impact factor: 2.918

2.  Typing of methicillin-resistant Staphylococcus aureus in a university hospital setting by using novel software for spa repeat determination and database management.

Authors:  Dag Harmsen; Heike Claus; Wolfgang Witte; Jörg Rothgänger; Hermann Claus; Doris Turnwald; Ulrich Vogel
Journal:  J Clin Microbiol       Date:  2003-12       Impact factor: 5.948

3.  The IS1167 insertion sequence is a phylogenetically informative marker among isolates of serotype 6B Streptococcus pneumoniae.

Authors:  D A Robinson; S K Hollingshead; J M Musser; A J Parkinson; D E Briles; M J Crain
Journal:  J Mol Evol       Date:  1998-08       Impact factor: 2.395

Review 4.  Short-sequence DNA repeats in prokaryotic genomes.

Authors:  A van Belkum; S Scherer; L van Alphen; H Verbrugh
Journal:  Microbiol Mol Biol Rev       Date:  1998-06       Impact factor: 11.056

5.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.

Authors:  J D Thompson; D G Higgins; T J Gibson
Journal:  Nucleic Acids Res       Date:  1994-11-11       Impact factor: 16.971

6.  Evaluation of protein A gene polymorphic region DNA sequencing for typing of Staphylococcus aureus strains.

Authors:  B Shopsin; M Gomez; S O Montgomery; D H Smith; M Waddington; D E Dodge; D A Bost; M Riehman; S Naidich; B N Kreiswirth
Journal:  J Clin Microbiol       Date:  1999-11       Impact factor: 5.948

7.  eBURST: inferring patterns of evolutionary descent among clusters of related bacterial genotypes from multilocus sequence typing data.

Authors:  Edward J Feil; Bao C Li; David M Aanensen; William P Hanage; Brian G Spratt
Journal:  J Bacteriol       Date:  2004-03       Impact factor: 3.490

8.  spa typing method for discriminating among Staphylococcus aureus isolates: implications for use of a single marker to detect genetic micro- and macrovariation.

Authors:  Larry Koreen; Srinivas V Ramaswamy; Edward A Graviss; Steven Naidich; James M Musser; Barry N Kreiswirth
Journal:  J Clin Microbiol       Date:  2004-02       Impact factor: 5.948

9.  Evolutionary models of the emergence of methicillin-resistant Staphylococcus aureus.

Authors:  D Ashley Robinson; Mark C Enright
Journal:  Antimicrob Agents Chemother       Date:  2003-12       Impact factor: 5.191

10.  Multilocus sequence typing for characterization of methicillin-resistant and methicillin-susceptible clones of Staphylococcus aureus.

Authors:  M C Enright; N P Day; C E Davies; S J Peacock; B G Spratt
Journal:  J Clin Microbiol       Date:  2000-03       Impact factor: 5.948

View more
  101 in total

1.  An association between bacterial genotype combined with a high-vancomycin minimum inhibitory concentration and risk of endocarditis in methicillin-resistant Staphylococcus aureus bloodstream infection.

Authors:  Clare E Miller; Rahul Batra; Ben S Cooper; Amita K Patel; John Klein; Jonathan A Otter; Theodore Kypraios; Gary L French; Olga Tosas; Jonathan D Edgeworth
Journal:  Clin Infect Dis       Date:  2011-12-20       Impact factor: 9.079

2.  Genotyping of 353 Staphylococcus aureus bloodstream isolates collected between 2004 and 2009 at a Norwegian university hospital and potential associations with clinical parameters.

Authors:  Hege Vangstein Aamot; Anita Blomfeldt; Arne N Eskesen
Journal:  J Clin Microbiol       Date:  2012-07-11       Impact factor: 5.948

3.  Panton-Valentine leukocidin-positive Staphylococcus aureus infections in returning travelers.

Authors:  Dennis Tappe; Marco H Schulze; Anett Oesterlein; Doris Turnwald; Andreas Müller; Ulrich Vogel; August Stich
Journal:  Am J Trop Med Hyg       Date:  2010-10       Impact factor: 2.345

4.  High prevalence of Panton-Valentine leukocidin among methicillin-sensitive Staphylococcus aureus colonization isolates in rural Iowa.

Authors:  Shylo E Wardyn; Brett M Forshey; Tara C Smith
Journal:  Microb Drug Resist       Date:  2012-04-25       Impact factor: 3.431

5.  A preliminary guideline for the assignment of methicillin-resistant Staphylococcus aureus to a Canadian pulsed-field gel electrophoresis epidemic type using spa typing.

Authors:  George R Golding; Jennifer L Campbell; Dave J Spreitzer; Joe Veyhl; Kathy Surynicz; Andrew Simor; Michael R Mulvey
Journal:  Can J Infect Dis Med Microbiol       Date:  2008-07       Impact factor: 2.471

6.  Characterization of clonal relatedness among the natural population of Staphylococcus aureus strains by using spa sequence typing and the BURP (based upon repeat patterns) algorithm.

Authors:  Alexander Mellmann; Thomas Weniger; Christoph Berssenbrügge; Ursula Keckevoet; Alexander W Friedrich; Dag Harmsen; Hajo Grundmann
Journal:  J Clin Microbiol       Date:  2008-06-04       Impact factor: 5.948

7.  Extensive genetic diversity identified among sporadic methicillin-resistant Staphylococcus aureus isolates recovered in Irish hospitals between 2000 and 2012.

Authors:  Peter M Kinnevey; Anna C Shore; Grainne I Brennan; Derek J Sullivan; Ralf Ehricht; Stefan Monecke; David C Coleman
Journal:  Antimicrob Agents Chemother       Date:  2014-01-06       Impact factor: 5.191

8.  Swine Farming Is a Risk Factor for Infection With and High Prevalence of Carriage of Multidrug-Resistant Staphylococcus aureus.

Authors:  Shylo E Wardyn; Brett M Forshey; Sarah A Farina; Ashley E Kates; Rajeshwari Nair; Megan K Quick; James Y Wu; Blake M Hanson; Sean M O'Malley; Hannah W Shows; Ellen M Heywood; Laura E Beane-Freeman; Charles F Lynch; Margaret Carrel; Tara C Smith
Journal:  Clin Infect Dis       Date:  2015-04-29       Impact factor: 9.079

9.  High-throughput typing of Staphylococcus aureus by amplified fragment length polymorphism (AFLP) or multi-locus variable number of tandem repeat analysis (MLVA) reveals consistent strain relatedness.

Authors:  D C Melles; L Schouls; P François; S Herzig; H A Verbrugh; A van Belkum; J Schrenzel
Journal:  Eur J Clin Microbiol Infect Dis       Date:  2008-07-29       Impact factor: 3.267

10.  Detection of staphylococcal cassette chromosome mec-associated DNA segments in multiresistant methicillin-susceptible Staphylococcus aureus (MSSA) and identification of Staphylococcus epidermidis ccrAB4 in both methicillin-resistant S. aureus and MSSA.

Authors:  Anna C Shore; Angela S Rossney; Brian O'Connell; Celine M Herra; Derek J Sullivan; Hilary Humphreys; David C Coleman
Journal:  Antimicrob Agents Chemother       Date:  2008-10-13       Impact factor: 5.191

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.