Literature DB >> 25505844

Clustering of water bodies in unpolluted and polluted environments based on Escherichia coli phylogroup abundance using a simple interaction database.

Nancy de Castro Stoppe¹, Juliana Saragiotto Silva², Tatiana Teixeira Torres³, Camila Carlos⁴, Elayse Maria Hachich⁵, Maria Inês Zanoli Sato⁵, Antonio Mauro Saraiva⁶, Laura Maria Mariscal Ottoboni⁴.

Abstract

Different types of water bodies, including lakes, streams, and coastal marine waters, are often susceptible to fecal contamination from a range of point and nonpoint sources, and have been evaluated using fecal indicator microorganisms. The most commonly used fecal indicator is Escherichia coli, but traditional cultivation methods do not allow discrimination of the source of pollution. The use of triplex PCR offers an approach that is fast and inexpensive, and here enabled the identification of phylogroups. The phylogenetic distribution of E. coli subgroups isolated from water samples revealed higher frequencies of subgroups A1 and B23 in rivers impacted by human pollution sources, while subgroups D1 and D2 were associated with pristine sites, and subgroup B1 with domesticated animal sources, suggesting their use as a first screening for pollution source identification. A simple classification is also proposed based on phylogenetic subgroup distribution using the w-clique metric, enabling differentiation of polluted and unpolluted sites.

Entities: Chemical Disease Gene Species

Keywords: E. coli; interaction networks; phylogenetic groups; pollution sources; social network analysis

Year: 2014 PMID： 25505844 PMCID： PMC4261969 DOI： 10.1590/S1415-47572014005000016

Source DB: PubMed Journal: Genet Mol Biol ISSN： 1415-4757 Impact factor: 1.771

Introduction

The microbiological quality of water is usually evaluated by means of fecal indicator microorganisms, and Escherichia coli has often been used because it is a normal inhabitant of the intestinal tracts of most warm-blooded animals. However, the traditional methods used hitherto have not allowed differentiation among host sources. Reliable and accurate source identification methods are extremely important for the control of fecal contamination from relevant animal origins, to protect recreational water users from waterborne pathogens, and to preserve the integrity of drinking water supplies (Roslev and Bukh, 2011; USEPA, 2005). Clermont developed a method for the assignment of E. coli isolates to four major phylogenetic groups: A, B1, D, and B2. Because of its simplicity and rapidity, it has been widely used for purposes including ecological niche differentiation, propensity to cause diseases, and fecal source tracking (Johnson ; Escobar-Paramo ; Orsi , 2008; Walk ; Gordon et al., 2008; Carlos ; Ratajczak ; Figueira ). The technique is based on triplex PCR and uses a combination of three loci (chuA, yjaA, and TspE4.C2). In order to improve the discriminative power of analyses when several isolates per sample were considered, Escobar-Páramo proposed the use of all the combinations of genetic markers, resulting in the definition of seven subgroups (A0, A1, B1, B22, B23, D1, and D2). Algorithms, metrics, and computational resources for analyzing interaction networks can be used as important tools to systematically measure interdependencies among molecular markers and water bodies. The conceptual foundations of these tools are the same as in Social Network Analysis (SNA), which provides algorithms and metrics to characterize the network structure and to identify cohesive subgroups. The aim of this work was to develop a classification of E. coli strains isolated from water bodies, based on phylogenetic subgroups, and to try to associate it with the pollution sources by means of the w-clique metric.

Materials and Methods

Sample collection

Water samples from twelve rivers and reservoirs with different pollution levels in the State of São Paulo (Figure 1 and Table 1) were collected in sterilized bottles according to Standard Methods (APHA, 2010). The sampling locations belonged to the surface water monitoring network established by CETESB (the São Paulo State environmental agency), whose Surface Water Monitoring (SWM) program includes physical, chemical, and biological analysis of water in the twenty-two Watershed Management Units (WMU) located in the State of São Paulo, Brazil. Two indices are currently used for the evaluation of domestic effluent dilution and the trophic state of the water bodies. The water quality index (WQI) is derived from a combined set of variables including pH, dissolved oxygen, biological oxygen demand, E. coli, water temperature, total nitrogen, total phosphorus, total suspended matter, and turbidity. The trophic state index (TSI), on the other hand, is based on the concentrations of chlorophyll and phosphorus. The WQI values range from 0 to 100, with five intervals to indicate the water quality: 0–19 (very bad); 20–36 (bad); 37–51 (acceptable); 52–79 (good); and 80–100 (very good). The TSI ranges from < 47 to > 67, with five intervals indicating the condition of the environment: < 47 (ultraoligotrophic); 48–52 (oligotrophic); 53–59 (mesotrophic); 60–63 (eutrophic); and > 67 (hypereutrophic). The SWM program has been operated by CETESB since 1974, and all the measurements since that time have been recorded. Point pollution sources are recorded in the São Paulo State Point Source Pollution Inventory. Furthermore, events that could influence the analysis (such as animals at the sampling site, or illegal sewage discharges) are reported in the sample collection form and then recorded in the water-monitoring database. The present study used all the historical data in order to identify the main pollution source at each site. The samples were collected bimonthly between July 2009 and April 2010 (CETESB, 2010, 2011).

Figure 1

Locations of sampling sites in the WMU (grey areas).

Table 1

Sampling sites on rivers and reservoirs, and distribution of phylogenetic subgroups.

Abbreviation	River or reservoir	Source of pollution	Average WQI	Average TSI	Main land use	Geographical coordinates	Number of isolates	Phylogenetic subgroup

								A₀	A₁	B1	B2₂	B2₃	D₁	D₂
BILL2801	Billings Reservoir - beach in front of a Wastewater Treatment Plant	Human	Bad	Hypereutrophic	Urban area	23°46′37″ S46°32′01″ W	51	13	22	5	1	7	1	2
BILL2251	Billings Reservoir - pier at the Engineering Institute camp area	Human	Acceptable	Eutrophic	Urban area	23°44′46″ S46°38′25″ W	53	13	14	5	2	18	0	1
GUAR0502	Guarapiranga Reservoir - Castelo country club	Human	Bad	Eutrophic	Urban area	23°42′53″ S46°42′58″ W	52	14	14	4	0	15	3	2
GUAR0601	Guarapiranga Reservoir - Odair restaurant	Human	Bad	Eutrophic	Urban area	23°41′57″ S46°44′41″ W	49	11	17	5	2	9	3	2
TIET3120	Tiete River - downstream of a WTP	Animal	Bad	Hypereutrophic	Urban and industrial area	23°30′11″ S46°20′13″ W	52	17	11	6	2	9	4	3
TGDE0900	Tanque Grande Reservoir - near Guarulhos	Animal	Good	Oligotrophic	Water source protection area	23°22′38″ S46°27′35″ W	51	19	20	2	3	4	3	0
AGUA2800	Aguapei River - city of Junqueiropolis	Animal	Good	Mesotrophic	Agricultural and livestock area	21°13′15″ S51°29′52″ W	54	25	0	12	0	4	6	7
JAMI2100	Jaguari-Mirim River - close to a farm	Animal	Good	Mesotrophic	Agricultural area	22°04′56″ S46°43′13″ W	22	17	3	2	0	0	0	0
TIET2050	Tiete River - near the river spring	Pristine	Good	Mesotrophic	Green belt area	23°33′54″ S46°00′57″ W	46	18	8	4	1	11	3	1
PTEI0900	Paratei River - in an environmental protection area	Pristine	Good	Mesotrophic	Environmental protection area	23°12′14″ S46°00′50″ W	44	27	10	3	1	2	0	1
IPIR0018	Ipiranga River - in an environmental protection area	Pristine	Very good	Ultraoligotrophic	Environmental protection area	23°20′9.4″ S45°08′1.4″ W	39	26	0	2	0	2	3	6
PBAL0014	Pau de Bala Stream - in an environmental protection area	Pristine	Very good	Ultraoligotrophic	Environmental protection area	23°19′57″ S45°07′9.6″ W	23	9	0	0	0	1	9	4

Locations of sampling sites in the WMU (grey areas). Sampling sites on rivers and reservoirs, and distribution of phylogenetic subgroups.

Isolation of strains

Samples were analyzed using the membrane filter technique according to U.S. Environmental Protection Agency Method 1603 (USEPA, 2002). Briefly, 0.01–100 mL volumes of water were filtered onto a 0.45 μm membrane and incubation was performed using modified mTEC agar (at 35 ± 0.5 °C for 2 h and at 44.5 ± 0.2 °C for 22–24 h). Approximately ten typical colonies (red to magenta in color) from each sample (12 sites and five collections) were streaked onto Endo agar LES (Difco), incubated for 24 h at 35 °C, and tested for citrate utilization, lactose fermentation, oxidase, L-lysine decarboxylase, motility, glucose and sucrose fermentation, tryptophan deamination, indole production, urea hydrolysis, and sulfide production. A typical E. coli profile was re-isolated on nutrient agar, incubated for 24 h at 35 °C, and kept at −70 °C in tryptic soy broth (Difco) with 10% (v/v) glycerol prior to further analysis (ATCC, 2010).

Phylogenetic grouping

Genomic DNA from the strains was isolated with the Wizard Genomic DNA Purification Kit (Promega), used according to the manufacturer’s instructions, and the phylogenetic grouping of E. coli strains was determined as previously described by Clermont . The strains were assigned to the seven phylogenetic subgroups according to the combination of PCR products of the genes chuA, yjaA, and DNA fragment TspE4.C2, as follows: A0 (−/−/−); A1 (−/+/−); B1 (−/−/+); B22 (+/+/−); B23 (+/+/+); D1 (+/−/−); and D2 (+/−/+) (Escobar-Páramo ).

Statistical analysis

A chi-square test was used to determine whether differences in the distributions of phylogenetic subgroups among rivers and reservoirs were significant. Correlation analysis was performed using the Mantel test, by comparing two dissimilarity matrices, calculated with the vegdist function (with Bray-Curtis index as parameter). These analyses were performed using the community ecology package Vegan for R (Oksanen, 2011). The similarity matrices were then prepared as a complement to the dissimilarity matrices (1-vegdist (matrix, “Bray”)). Phylogenetic subgroups were clustered by their similarity matrices using the UPGMA (unweighted pair group method with arithmetic mean) algorithm, and a dendrogram was constructed using the DendroUPGMA computational tool (Garcia-Vallve ).

Social Network Analysis metric

The SNA metric w-clique has been used to identify cohesive subgroups (clusters) in network structures (Araújo ). A clique is composed of a set of three or more vertices totally connected to each other (Nooy ). The w-clique considers vertex groups in which all the vertices are connected to each other by “strong” interactions (the weights of which are higher than the average network weight).

Data analysis

The data used in the present study were obtained from a bipartite microbiological interaction database, composed of a weighted matrix (isolates abundance), in which the rows corresponded to water bodies and the columns corresponded to phylogroups (Table 1). In order to identify cohesive subgroups in a weighted interaction network we used the program “Dieta1” which is based on the complex network theory (Araújo ). In this analysis, the data type used was integers, Monte Carlo bootstrapping employed 1000 replications, the diet proportion calculation used numerical sums, and the weight factor was five. A binary matrix (0/1) was obtained in which cells containing the number one represented interactions whose weights were higher than the average network weight (w-cliques). The Pajek program was used to transform the network from arcs to edges (Batagelj and Mrvar, 1998). The matrix was submitted to the Ucinet program for identifying w-cliques. Two output files were generated, one showing the cliques found (identification of the individual memberships) and the other with the cluster diagram (dendrogram) (Everett and Borgatti, 1998; Borgatti ).

Results

A total of 543 strains were isolated from twelve rivers and reservoirs (Figure 1 and Table 1), and were classified according to the phylogenetic subgroups. By selecting 10 colonies from 12 sites in five sampling events we would expect 600 isolated strains. However, for some sites and sampling events, it was not possible to obtain 10 strains because either 10 typical colonies were not grown or 10 typical strains were not obtained in the confirmatory tests. The observed distribution of the phylogenetic subgroups among rivers (Figure 2) was significantly different from the expected frequencies (χ2 = 217.22, df = 66, p < 0.005). The majority of environmental strains belonged to subgroup A0, even though river and reservoir sites had different sources of pollution.

Figure 2

Distribution of E. coli phylogenetic subgroups in rivers and reservoirs. The circle size is proportional to phylogenetic subgroup frequency.

Distribution of E. coli phylogenetic subgroups in rivers and reservoirs. The circle size is proportional to phylogenetic subgroup frequency. Only seven strains isolated from different sites (TIET2050, TIET3120, TGDE0900, JAMI02100, PTEI02900, and BILL2801) presented a chuA−, yjaA+, TspE4.C2+ profile (data not shown), and in accordance with Rodrigues-Siek were assigned as group B1. The lower frequency observed here indicates a rare occurrence of this profile, as was also observed by Higgins , who found it in only one out of 68 strains isolated from surface water samples. Gordon characterized 662 E. coli strains, including those from different hosts and environmental strains and did not observe any strain matching this profile. Although most human strains belong to group A, Orsi and Carlos suggested the use of group B2 as an indicator of human pollution sources, due to its recurrence in this host. In agreement with these results, the present data also revealed a high prevalence of this group for the sites strongly impacted by human sources (BILL2801, BILL2251, GUAR0502, and GUAR0601). Ten years ago, group A was the most frequent in the Billings and Guarapiranga Reservoirs, followed by groups B1, D, and B2 (Orsi ). The present results also showed a predominance of group A, but a decrease of B1 and a significant increase of the frequency of group B2. This group seemed to be associated with human pollution sources, which have increased in recent years. In the last decade, the populations living in the areas surrounding the dams of the Billings and Guarapiranga Reservoirs have increased by 24% and 30%, respectively. Part of this population has no access to either sewage collection or wastewater treatment, which could explain the observations. Furthermore, at these sites the WQI has changed from good to bad in the last ten years. The sites where domesticated animal pollution sources were expected did not present similar phylogroup distribution patterns. TIET3120 and TGDE0900 were located downstream of cities that discharge untreated waste-water, as a result of which the phylogroups distribution was analogous to the sites with human pollution sources. Meanwhile, AGUA2800 and JAMI2100 were located in areas with agricultural activities and the seasonal presence of cattle, and the most frequent subgroups were A0 and B1, as observed by others (Higgins ; Ishii ; Carlos ). Two of the pristine sites (IPIR0018 and PBAL0014) showed a high frequency of group D, as also observed by Higgins for an unpolluted site, suggesting an association with wildlife. Surprisingly, the other sites in more natural areas (TIET2050 and PTEI2900) showed higher frequencies of groups A and B2, similar to the human source sites. Sites IPIR0018 and PBAL0014 were located inside an environmental protection area (from source to mouth) and were truly pristine, while sites TIET2050 and PTEI2900 might have received some input from anthropogenic sources, despite the good water quality indices obtained for these sites during the study period (Table 1) (CETESB, 2010, 2011). As these sites were not located inside protected areas, they could have been affected by nonpoint pollution sources, or even illegal discharges. The most abundant subgroup, A0, was not used for classification of the water samples, as a high frequency of incorrect assignments was observed for strains that failed to yield any PCR product in a previous study (Gordon ). Higher frequencies of the subgroups A1 and B23 evidenced human contamination, while B1 reflected domesticated animal contamination, and D1 and D2 were characteristic of pristine environments. The Mantel test for correlation between the quality indices, WQI and TSI, showed only a weak correlation (r = 0.36, p = 0.014), while the phylogenetic subgroup distribution showed no correlation with the WQI and TSI indices (r = 0.2537, p = 0.074). This observation indicates that both indices may have missed important information for pollution evaluation. Since the phylogenetic subgroup distribution seemed to be a suitable tool for identification of sources of pollution, it could be adopted for pollution classification of water bodies. In an attempt to cluster the rivers according to the source and degree of pollution, the data were evaluated using correspondence analysis; however, a spread distribution was observed (data not shown). A matrix of similarity among the isolates was calculated and clustered by UPGMA, and showed two groups (Figure 3). The first cluster contained two pristine sites (IPIR0018 and PBAL0014) and one animal site (AGUA2800), which was unexpected since these sites did not share similar characteristics. This was also observed in the second group, where sites with different degrees and sources of pollution were clustered, suggesting that this tool was not appropriate for this biological enquiry.

Figure 3

Dendrogram obtained by the UPGMA cluster analysis method.

Dendrogram obtained by the UPGMA cluster analysis method. The influence of geographical location appeared to be an important factor in the distribution of phylogenetic subgroups. The w-clique metric clustered the water bodies in two groups (Figure 4). The first group contained the water bodies belonging to the São Paulo Metropolitan Area, and the other clustered those located far from the metropolitan region. The geographical location reflected the degree of pollution, since for many years the São Paulo Metropolitan Area has suffered from air and water pollution. Interestingly, at the sites where nonpoint animal pollution sources were expected (TGDE0900, JAMI2100, and AGUA2800), no specific distribution was observed, suggesting that animal sources were of lesser importance. Similar cluster results were observed when the strains belonging to subgroup A0 were removed (data not shown).

Figure 4

Dendrogram obtained using the w-clique metric, showing water body clusters.

Discussion

The distribution of the phylogenetic subgroups in environmental samples has shown dissimilar patterns. For instance, group B1 was the most common (over 70%) for beaches in California and for environmental waters surrounding sewage treatment plants in Australia (Hamilton ; Anastasi ). Previous studies of surface waters (lakes and rivers) found that half of the isolates belonged to phylogroup B1, suggesting that this was the most frequent group in environmental media (Power ; Hamelin ; Walk ). Importantly, the structure of an E. coli population in water can be influenced by other factors, such as the hydrological conditions in the watershed and the geographical location (Ratajczak ; Tenallion ). Some studies reported that subgroup A0 was more environmentally adapted (Higgins ; Walk ; Figueira ), while B22 was the least common subgroup found in rivers and reservoirs (Figueira ), in line with the results obtained here. The two water quality indices, WQI and TSI, were unable to reveal the occurrence of contamination, suggesting that other tools should also be used for pollution evaluation. On the other hand, phylogenetic subgroups could be used as a first screening for pollution source identification (subgroups A1 and B23 for human contamination, B1 for domesticated animal contamination, and D1 and D2 for pristine environments). A new phylotyping method was recently proposed by Clermont . The most important improvement of the new quadruplex PCR-based method is the ability to detect E. coli strains belonging to phylogroups C, E, F, and clade I. This could have improved the discrimination power of the present analysis. However, the use of network metrics showed that the origin of the samples could be assigned, even without the information for phylogroups C, E, F, and clade I. W-clique subclusters might be obtained by including rare phylogroups. Gordon demonstrated that 15–20% of Australian E. coli isolates typed as A0, D1, or D2 using the triplex PCR method were incorrectly assigned. They showed that most of the incorrect assignments were observed for strains that failed to yield any PCR products using the triplex method (Clermont ). To circumvent this problem, in the present work the isolates classified as A0 were excluded from the downstream analysis. In the case of the D phylogroups, it is possible that there was a pool of rare strains. However, this would not invalidate the results. As argued by Gordon , the triplex method (Clermont ) is still an excellent and cost-effective method for assigning strains of E. coli to phylogroups, because the fraction of strains that cannot be assigned to a phylogroup and are incorrectly assigned is very low. Using clique identification, it was therefore possible to discover new patterns in a simple interaction database, such as clustering of water bodies (in unpolluted and polluted environments) based on phylogroup abundance. This clustering was not revealed using traditional methods, illustrating the innovative contribution of the proposed approach. The results demonstrated that the commonly used water quality indices could not address all aspects of the evaluation of domestic effluent dilution and the trophic state of the water bodies, since the TIET2050 and TGDE00900 samples presented good average scores (Table 1), but had phylogenetic group distributions that were more related to polluted sites, according to the w-clique classification. These findings suggest that the w-clique metric could be used as a complementary tool in pollution classification and evaluation of the degree of contamination of inland waters.

23 in total

1. Pathogenic Escherichia coli found in sewage treatment plants and environmental waters.

Authors: E M Anastasi; B Matthews; H M Stratton; M Katouli
Journal: Appl Environ Microbiol Date: 2012-06-01 Impact factor: 4.792

2. Large scale analysis of virulence genes in Escherichia coli strains isolated from Avalon Bay, CA.

Authors: Matthew J Hamilton; Asbah Z Hadi; John F Griffith; Satoshi Ishii; Michael J Sadowsky
Journal: Water Res Date: 2010-06-30 Impact factor: 11.236

3. Horizontal gene transfer in glycosyl hydrolases inferred from codon usage in Escherichia coli and Bacillus subtilis.

Authors: S Garcia-Vallvé; J Palau; A Romeu
Journal: Mol Biol Evol Date: 1999-09 Impact factor: 16.240

4. Comparison of Escherichia coli isolates implicated in human urinary tract infection and avian colibacillosis.

Authors: Kylie E Rodriguez-Siek; Catherine W Giddings; Curt Doetkott; Timothy J Johnson; Mohamed K Fakhr; Lisa K Nolan
Journal: Microbiology Date: 2005-06 Impact factor: 2.777

5. Occurrence of virulence and antimicrobial resistance genes in Escherichia coli isolates from different aquatic ecosystems within the St. Clair River and Detroit River areas.

Authors: Katia Hamelin; Guillaume Bruant; Abdel El-Shaarawi; Stephen Hill; Thomas A Edge; John Fairbrother; Josée Harel; Christine Maynard; Luke Masson; Roland Brousseau
Journal: Appl Environ Microbiol Date: 2006-11-03 Impact factor: 4.792

6. Escherichia coli phylogenetic group determination and its application in the identification of the major animal source of fecal contamination.

Authors: Camila Carlos; Mathias M Pires; Nancy C Stoppe; Elayse M Hachich; Maria I Z Sato; Tânia A T Gomes; Luiz A Amaral; Laura M M Ottoboni
Journal: BMC Microbiol Date: 2010-06-01 Impact factor: 3.605

7. Large-scale population structure of human commensal Escherichia coli isolates.

Authors: Patricia Escobar-Páramo; Karine Grenet; Arnaud Le Menac'h; Luc Rode; Emmanuelle Salgado; Christine Amorin; Stéphanie Gouriou; Bertrand Picard; Mohamed Chérif Rahimy; Antoine Andremont; Erick Denamur; Raymond Ruimy
Journal: Appl Environ Microbiol Date: 2004-09 Impact factor: 4.792

8. Assigning Escherichia coli strains to phylogenetic groups: multi-locus sequence typing versus the PCR triplex method.

Authors: David M Gordon; Olivier Clermont; Heather Tolley; Erick Denamur
Journal: Environ Microbiol Date: 2008-06-02 Impact factor: 5.491

9. Relationship between phylogenetic groups, genotypic clusters, and virulence gene profiles of Escherichia coli strains from diverse human and animal sources.

Authors: Satoshi Ishii; Katriya P Meyer; Michael J Sadowsky
Journal: Appl Environ Microbiol Date: 2007-07-20 Impact factor: 4.792

10. The Clermont Escherichia coli phylo-typing method revisited: improvement of specificity and detection of new phylo-groups.

Authors: Olivier Clermont; Julia K Christenson; Erick Denamur; David M Gordon
Journal: Environ Microbiol Rep Date: 2012-12-24 Impact factor: 3.541

1 in total

1. Worldwide Phylogenetic Group Patterns of Escherichia coli from Commensal Human and Wastewater Treatment Plant Isolates.

Authors: Nancy de Castro Stoppe; Juliana S Silva; Camila Carlos; Maria I Z Sato; Antonio M Saraiva; Laura M M Ottoboni; Tatiana T Torres
Journal: Front Microbiol Date: 2017-12-21 Impact factor: 5.640

1 in total