Literature DB >> 25505844

Clustering of water bodies in unpolluted and polluted environments based on Escherichia coli phylogroup abundance using a simple interaction database.

Nancy de Castro Stoppe1, Juliana Saragiotto Silva2, Tatiana Teixeira Torres3, Camila Carlos4, Elayse Maria Hachich5, Maria Inês Zanoli Sato5, Antonio Mauro Saraiva6, Laura Maria Mariscal Ottoboni4.   

Abstract

Different types of water bodies, including lakes, streams, and coastal marine waters, are often susceptible to fecal contamination from a range of point and nonpoint sources, and have been evaluated using fecal indicator microorganisms. The most commonly used fecal indicator is Escherichia coli, but traditional cultivation methods do not allow discrimination of the source of pollution. The use of triplex PCR offers an approach that is fast and inexpensive, and here enabled the identification of phylogroups. The phylogenetic distribution of E. coli subgroups isolated from water samples revealed higher frequencies of subgroups A1 and B23 in rivers impacted by human pollution sources, while subgroups D1 and D2 were associated with pristine sites, and subgroup B1 with domesticated animal sources, suggesting their use as a first screening for pollution source identification. A simple classification is also proposed based on phylogenetic subgroup distribution using the w-clique metric, enabling differentiation of polluted and unpolluted sites.

Entities:  

Keywords:  E. coli; interaction networks; phylogenetic groups; pollution sources; social network analysis

Year:  2014        PMID: 25505844      PMCID: PMC4261969          DOI: 10.1590/S1415-47572014005000016

Source DB:  PubMed          Journal:  Genet Mol Biol        ISSN: 1415-4757            Impact factor:   1.771


Introduction

The microbiological quality of water is usually evaluated by means of fecal indicator microorganisms, and Escherichia coli has often been used because it is a normal inhabitant of the intestinal tracts of most warm-blooded animals. However, the traditional methods used hitherto have not allowed differentiation among host sources. Reliable and accurate source identification methods are extremely important for the control of fecal contamination from relevant animal origins, to protect recreational water users from waterborne pathogens, and to preserve the integrity of drinking water supplies (Roslev and Bukh, 2011; USEPA, 2005). Clermont developed a method for the assignment of E. coli isolates to four major phylogenetic groups: A, B1, D, and B2. Because of its simplicity and rapidity, it has been widely used for purposes including ecological niche differentiation, propensity to cause diseases, and fecal source tracking (Johnson ; Escobar-Paramo ; Orsi , 2008; Walk ; Gordon et al., 2008; Carlos ; Ratajczak ; Figueira ). The technique is based on triplex PCR and uses a combination of three loci (chuA, yjaA, and TspE4.C2). In order to improve the discriminative power of analyses when several isolates per sample were considered, Escobar-Páramo proposed the use of all the combinations of genetic markers, resulting in the definition of seven subgroups (A0, A1, B1, B22, B23, D1, and D2). Algorithms, metrics, and computational resources for analyzing interaction networks can be used as important tools to systematically measure interdependencies among molecular markers and water bodies. The conceptual foundations of these tools are the same as in Social Network Analysis (SNA), which provides algorithms and metrics to characterize the network structure and to identify cohesive subgroups. The aim of this work was to develop a classification of E. coli strains isolated from water bodies, based on phylogenetic subgroups, and to try to associate it with the pollution sources by means of the w-clique metric.

Materials and Methods

Sample collection

Water samples from twelve rivers and reservoirs with different pollution levels in the State of São Paulo (Figure 1 and Table 1) were collected in sterilized bottles according to Standard Methods (APHA, 2010). The sampling locations belonged to the surface water monitoring network established by CETESB (the São Paulo State environmental agency), whose Surface Water Monitoring (SWM) program includes physical, chemical, and biological analysis of water in the twenty-two Watershed Management Units (WMU) located in the State of São Paulo, Brazil. Two indices are currently used for the evaluation of domestic effluent dilution and the trophic state of the water bodies. The water quality index (WQI) is derived from a combined set of variables including pH, dissolved oxygen, biological oxygen demand, E. coli, water temperature, total nitrogen, total phosphorus, total suspended matter, and turbidity. The trophic state index (TSI), on the other hand, is based on the concentrations of chlorophyll and phosphorus. The WQI values range from 0 to 100, with five intervals to indicate the water quality: 0–19 (very bad); 20–36 (bad); 37–51 (acceptable); 52–79 (good); and 80–100 (very good). The TSI ranges from < 47 to > 67, with five intervals indicating the condition of the environment: < 47 (ultraoligotrophic); 48–52 (oligotrophic); 53–59 (mesotrophic); 60–63 (eutrophic); and > 67 (hypereutrophic). The SWM program has been operated by CETESB since 1974, and all the measurements since that time have been recorded. Point pollution sources are recorded in the São Paulo State Point Source Pollution Inventory. Furthermore, events that could influence the analysis (such as animals at the sampling site, or illegal sewage discharges) are reported in the sample collection form and then recorded in the water-monitoring database. The present study used all the historical data in order to identify the main pollution source at each site. The samples were collected bimonthly between July 2009 and April 2010 (CETESB, 2010, 2011).
Figure 1

Locations of sampling sites in the WMU (grey areas).

Table 1

Sampling sites on rivers and reservoirs, and distribution of phylogenetic subgroups.

AbbreviationRiver or reservoirSource of pollutionAverage WQIAverage TSIMain land useGeographical coordinatesNumber of isolatesPhylogenetic subgroup

A0A1B1B22B23D1D2
BILL2801Billings Reservoir - beach in front of a Wastewater Treatment PlantHumanBadHypereutrophicUrban area23°46′37″ S46°32′01″ W51132251712
BILL2251Billings Reservoir - pier at the Engineering Institute camp areaHumanAcceptableEutrophicUrban area23°44′46″ S46°38′25″ W531314521801
GUAR0502Guarapiranga Reservoir - Castelo country clubHumanBadEutrophicUrban area23°42′53″ S46°42′58″ W521414401532
GUAR0601Guarapiranga Reservoir - Odair restaurantHumanBadEutrophicUrban area23°41′57″ S46°44′41″ W49111752932
TIET3120Tiete River - downstream of a WTPAnimalBadHypereutrophicUrban and industrial area23°30′11″ S46°20′13″ W52171162943
TGDE0900Tanque Grande Reservoir - near GuarulhosAnimalGoodOligotrophicWater source protection area23°22′38″ S46°27′35″ W51192023430
AGUA2800Aguapei River - city of JunqueiropolisAnimalGoodMesotrophicAgricultural and livestock area21°13′15″ S51°29′52″ W54250120467
JAMI2100Jaguari-Mirim River - close to a farmAnimalGoodMesotrophicAgricultural area22°04′56″ S46°43′13″ W2217320000
TIET2050Tiete River - near the river springPristineGoodMesotrophicGreen belt area23°33′54″ S46°00′57″ W46188411131
PTEI0900Paratei River - in an environmental protection areaPristineGoodMesotrophicEnvironmental protection area23°12′14″ S46°00′50″ W44271031201
IPIR0018Ipiranga River - in an environmental protection areaPristineVery goodUltraoligotrophicEnvironmental protection area23°20′9.4″ S45°08′1.4″ W3926020236
PBAL0014Pau de Bala Stream - in an environmental protection areaPristineVery goodUltraoligotrophicEnvironmental protection area23°19′57″ S45°07′9.6″ W239000194
Locations of sampling sites in the WMU (grey areas). Sampling sites on rivers and reservoirs, and distribution of phylogenetic subgroups.

Isolation of strains

Samples were analyzed using the membrane filter technique according to U.S. Environmental Protection Agency Method 1603 (USEPA, 2002). Briefly, 0.01–100 mL volumes of water were filtered onto a 0.45 μm membrane and incubation was performed using modified mTEC agar (at 35 ± 0.5 °C for 2 h and at 44.5 ± 0.2 °C for 22–24 h). Approximately ten typical colonies (red to magenta in color) from each sample (12 sites and five collections) were streaked onto Endo agar LES (Difco), incubated for 24 h at 35 °C, and tested for citrate utilization, lactose fermentation, oxidase, L-lysine decarboxylase, motility, glucose and sucrose fermentation, tryptophan deamination, indole production, urea hydrolysis, and sulfide production. A typical E. coli profile was re-isolated on nutrient agar, incubated for 24 h at 35 °C, and kept at −70 °C in tryptic soy broth (Difco) with 10% (v/v) glycerol prior to further analysis (ATCC, 2010).

Phylogenetic grouping

Genomic DNA from the strains was isolated with the Wizard Genomic DNA Purification Kit (Promega), used according to the manufacturer’s instructions, and the phylogenetic grouping of E. coli strains was determined as previously described by Clermont . The strains were assigned to the seven phylogenetic subgroups according to the combination of PCR products of the genes chuA, yjaA, and DNA fragment TspE4.C2, as follows: A0 (−/−/−); A1 (−/+/−); B1 (−/−/+); B22 (+/+/−); B23 (+/+/+); D1 (+/−/−); and D2 (+/−/+) (Escobar-Páramo ).

Statistical analysis

A chi-square test was used to determine whether differences in the distributions of phylogenetic subgroups among rivers and reservoirs were significant. Correlation analysis was performed using the Mantel test, by comparing two dissimilarity matrices, calculated with the vegdist function (with Bray-Curtis index as parameter). These analyses were performed using the community ecology package Vegan for R (Oksanen, 2011). The similarity matrices were then prepared as a complement to the dissimilarity matrices (1-vegdist (matrix, “Bray”)). Phylogenetic subgroups were clustered by their similarity matrices using the UPGMA (unweighted pair group method with arithmetic mean) algorithm, and a dendrogram was constructed using the DendroUPGMA computational tool (Garcia-Vallve ).

Social Network Analysis metric

The SNA metric w-clique has been used to identify cohesive subgroups (clusters) in network structures (Araújo ). A clique is composed of a set of three or more vertices totally connected to each other (Nooy ). The w-clique considers vertex groups in which all the vertices are connected to each other by “strong” interactions (the weights of which are higher than the average network weight).

Data analysis

The data used in the present study were obtained from a bipartite microbiological interaction database, composed of a weighted matrix (isolates abundance), in which the rows corresponded to water bodies and the columns corresponded to phylogroups (Table 1). In order to identify cohesive subgroups in a weighted interaction network we used the program “Dieta1” which is based on the complex network theory (Araújo ). In this analysis, the data type used was integers, Monte Carlo bootstrapping employed 1000 replications, the diet proportion calculation used numerical sums, and the weight factor was five. A binary matrix (0/1) was obtained in which cells containing the number one represented interactions whose weights were higher than the average network weight (w-cliques). The Pajek program was used to transform the network from arcs to edges (Batagelj and Mrvar, 1998). The matrix was submitted to the Ucinet program for identifying w-cliques. Two output files were generated, one showing the cliques found (identification of the individual memberships) and the other with the cluster diagram (dendrogram) (Everett and Borgatti, 1998; Borgatti ).

Results

A total of 543 strains were isolated from twelve rivers and reservoirs (Figure 1 and Table 1), and were classified according to the phylogenetic subgroups. By selecting 10 colonies from 12 sites in five sampling events we would expect 600 isolated strains. However, for some sites and sampling events, it was not possible to obtain 10 strains because either 10 typical colonies were not grown or 10 typical strains were not obtained in the confirmatory tests. The observed distribution of the phylogenetic subgroups among rivers (Figure 2) was significantly different from the expected frequencies (χ2 = 217.22, df = 66, p < 0.005). The majority of environmental strains belonged to subgroup A0, even though river and reservoir sites had different sources of pollution.
Figure 2

Distribution of E. coli phylogenetic subgroups in rivers and reservoirs. The circle size is proportional to phylogenetic subgroup frequency.

Distribution of E. coli phylogenetic subgroups in rivers and reservoirs. The circle size is proportional to phylogenetic subgroup frequency. Only seven strains isolated from different sites (TIET2050, TIET3120, TGDE0900, JAMI02100, PTEI02900, and BILL2801) presented a chuA−, yjaA+, TspE4.C2+ profile (data not shown), and in accordance with Rodrigues-Siek were assigned as group B1. The lower frequency observed here indicates a rare occurrence of this profile, as was also observed by Higgins , who found it in only one out of 68 strains isolated from surface water samples. Gordon characterized 662 E. coli strains, including those from different hosts and environmental strains and did not observe any strain matching this profile. Although most human strains belong to group A, Orsi and Carlos suggested the use of group B2 as an indicator of human pollution sources, due to its recurrence in this host. In agreement with these results, the present data also revealed a high prevalence of this group for the sites strongly impacted by human sources (BILL2801, BILL2251, GUAR0502, and GUAR0601). Ten years ago, group A was the most frequent in the Billings and Guarapiranga Reservoirs, followed by groups B1, D, and B2 (Orsi ). The present results also showed a predominance of group A, but a decrease of B1 and a significant increase of the frequency of group B2. This group seemed to be associated with human pollution sources, which have increased in recent years. In the last decade, the populations living in the areas surrounding the dams of the Billings and Guarapiranga Reservoirs have increased by 24% and 30%, respectively. Part of this population has no access to either sewage collection or wastewater treatment, which could explain the observations. Furthermore, at these sites the WQI has changed from good to bad in the last ten years. The sites where domesticated animal pollution sources were expected did not present similar phylogroup distribution patterns. TIET3120 and TGDE0900 were located downstream of cities that discharge untreated waste-water, as a result of which the phylogroups distribution was analogous to the sites with human pollution sources. Meanwhile, AGUA2800 and JAMI2100 were located in areas with agricultural activities and the seasonal presence of cattle, and the most frequent subgroups were A0 and B1, as observed by others (Higgins ; Ishii ; Carlos ). Two of the pristine sites (IPIR0018 and PBAL0014) showed a high frequency of group D, as also observed by Higgins for an unpolluted site, suggesting an association with wildlife. Surprisingly, the other sites in more natural areas (TIET2050 and PTEI2900) showed higher frequencies of groups A and B2, similar to the human source sites. Sites IPIR0018 and PBAL0014 were located inside an environmental protection area (from source to mouth) and were truly pristine, while sites TIET2050 and PTEI2900 might have received some input from anthropogenic sources, despite the good water quality indices obtained for these sites during the study period (Table 1) (CETESB, 2010, 2011). As these sites were not located inside protected areas, they could have been affected by nonpoint pollution sources, or even illegal discharges. The most abundant subgroup, A0, was not used for classification of the water samples, as a high frequency of incorrect assignments was observed for strains that failed to yield any PCR product in a previous study (Gordon ). Higher frequencies of the subgroups A1 and B23 evidenced human contamination, while B1 reflected domesticated animal contamination, and D1 and D2 were characteristic of pristine environments. The Mantel test for correlation between the quality indices, WQI and TSI, showed only a weak correlation (r = 0.36, p = 0.014), while the phylogenetic subgroup distribution showed no correlation with the WQI and TSI indices (r = 0.2537, p = 0.074). This observation indicates that both indices may have missed important information for pollution evaluation. Since the phylogenetic subgroup distribution seemed to be a suitable tool for identification of sources of pollution, it could be adopted for pollution classification of water bodies. In an attempt to cluster the rivers according to the source and degree of pollution, the data were evaluated using correspondence analysis; however, a spread distribution was observed (data not shown). A matrix of similarity among the isolates was calculated and clustered by UPGMA, and showed two groups (Figure 3). The first cluster contained two pristine sites (IPIR0018 and PBAL0014) and one animal site (AGUA2800), which was unexpected since these sites did not share similar characteristics. This was also observed in the second group, where sites with different degrees and sources of pollution were clustered, suggesting that this tool was not appropriate for this biological enquiry.
Figure 3

Dendrogram obtained by the UPGMA cluster analysis method.

Dendrogram obtained by the UPGMA cluster analysis method. The influence of geographical location appeared to be an important factor in the distribution of phylogenetic subgroups. The w-clique metric clustered the water bodies in two groups (Figure 4). The first group contained the water bodies belonging to the São Paulo Metropolitan Area, and the other clustered those located far from the metropolitan region. The geographical location reflected the degree of pollution, since for many years the São Paulo Metropolitan Area has suffered from air and water pollution. Interestingly, at the sites where nonpoint animal pollution sources were expected (TGDE0900, JAMI2100, and AGUA2800), no specific distribution was observed, suggesting that animal sources were of lesser importance. Similar cluster results were observed when the strains belonging to subgroup A0 were removed (data not shown).
Figure 4

Dendrogram obtained using the w-clique metric, showing water body clusters.

Dendrogram obtained using the w-clique metric, showing water body clusters.

Discussion

The distribution of the phylogenetic subgroups in environmental samples has shown dissimilar patterns. For instance, group B1 was the most common (over 70%) for beaches in California and for environmental waters surrounding sewage treatment plants in Australia (Hamilton ; Anastasi ). Previous studies of surface waters (lakes and rivers) found that half of the isolates belonged to phylogroup B1, suggesting that this was the most frequent group in environmental media (Power ; Hamelin ; Walk ). Importantly, the structure of an E. coli population in water can be influenced by other factors, such as the hydrological conditions in the watershed and the geographical location (Ratajczak ; Tenallion ). Some studies reported that subgroup A0 was more environmentally adapted (Higgins ; Walk ; Figueira ), while B22 was the least common subgroup found in rivers and reservoirs (Figueira ), in line with the results obtained here. The two water quality indices, WQI and TSI, were unable to reveal the occurrence of contamination, suggesting that other tools should also be used for pollution evaluation. On the other hand, phylogenetic subgroups could be used as a first screening for pollution source identification (subgroups A1 and B23 for human contamination, B1 for domesticated animal contamination, and D1 and D2 for pristine environments). A new phylotyping method was recently proposed by Clermont . The most important improvement of the new quadruplex PCR-based method is the ability to detect E. coli strains belonging to phylogroups C, E, F, and clade I. This could have improved the discrimination power of the present analysis. However, the use of network metrics showed that the origin of the samples could be assigned, even without the information for phylogroups C, E, F, and clade I. W-clique subclusters might be obtained by including rare phylogroups. Gordon demonstrated that 15–20% of Australian E. coli isolates typed as A0, D1, or D2 using the triplex PCR method were incorrectly assigned. They showed that most of the incorrect assignments were observed for strains that failed to yield any PCR products using the triplex method (Clermont ). To circumvent this problem, in the present work the isolates classified as A0 were excluded from the downstream analysis. In the case of the D phylogroups, it is possible that there was a pool of rare strains. However, this would not invalidate the results. As argued by Gordon , the triplex method (Clermont ) is still an excellent and cost-effective method for assigning strains of E. coli to phylogroups, because the fraction of strains that cannot be assigned to a phylogroup and are incorrectly assigned is very low. Using clique identification, it was therefore possible to discover new patterns in a simple interaction database, such as clustering of water bodies (in unpolluted and polluted environments) based on phylogroup abundance. This clustering was not revealed using traditional methods, illustrating the innovative contribution of the proposed approach. The results demonstrated that the commonly used water quality indices could not address all aspects of the evaluation of domestic effluent dilution and the trophic state of the water bodies, since the TIET2050 and TGDE00900 samples presented good average scores (Table 1), but had phylogenetic group distributions that were more related to polluted sites, according to the w-clique classification. These findings suggest that the w-clique metric could be used as a complementary tool in pollution classification and evaluation of the degree of contamination of inland waters.
  23 in total

1.  Pathogenic Escherichia coli found in sewage treatment plants and environmental waters.

Authors:  E M Anastasi; B Matthews; H M Stratton; M Katouli
Journal:  Appl Environ Microbiol       Date:  2012-06-01       Impact factor: 4.792

2.  Large scale analysis of virulence genes in Escherichia coli strains isolated from Avalon Bay, CA.

Authors:  Matthew J Hamilton; Asbah Z Hadi; John F Griffith; Satoshi Ishii; Michael J Sadowsky
Journal:  Water Res       Date:  2010-06-30       Impact factor: 11.236

3.  Horizontal gene transfer in glycosyl hydrolases inferred from codon usage in Escherichia coli and Bacillus subtilis.

Authors:  S Garcia-Vallvé; J Palau; A Romeu
Journal:  Mol Biol Evol       Date:  1999-09       Impact factor: 16.240

4.  Comparison of Escherichia coli isolates implicated in human urinary tract infection and avian colibacillosis.

Authors:  Kylie E Rodriguez-Siek; Catherine W Giddings; Curt Doetkott; Timothy J Johnson; Mohamed K Fakhr; Lisa K Nolan
Journal:  Microbiology       Date:  2005-06       Impact factor: 2.777

5.  Occurrence of virulence and antimicrobial resistance genes in Escherichia coli isolates from different aquatic ecosystems within the St. Clair River and Detroit River areas.

Authors:  Katia Hamelin; Guillaume Bruant; Abdel El-Shaarawi; Stephen Hill; Thomas A Edge; John Fairbrother; Josée Harel; Christine Maynard; Luke Masson; Roland Brousseau
Journal:  Appl Environ Microbiol       Date:  2006-11-03       Impact factor: 4.792

6.  Escherichia coli phylogenetic group determination and its application in the identification of the major animal source of fecal contamination.

Authors:  Camila Carlos; Mathias M Pires; Nancy C Stoppe; Elayse M Hachich; Maria I Z Sato; Tânia A T Gomes; Luiz A Amaral; Laura M M Ottoboni
Journal:  BMC Microbiol       Date:  2010-06-01       Impact factor: 3.605

7.  Large-scale population structure of human commensal Escherichia coli isolates.

Authors:  Patricia Escobar-Páramo; Karine Grenet; Arnaud Le Menac'h; Luc Rode; Emmanuelle Salgado; Christine Amorin; Stéphanie Gouriou; Bertrand Picard; Mohamed Chérif Rahimy; Antoine Andremont; Erick Denamur; Raymond Ruimy
Journal:  Appl Environ Microbiol       Date:  2004-09       Impact factor: 4.792

8.  Assigning Escherichia coli strains to phylogenetic groups: multi-locus sequence typing versus the PCR triplex method.

Authors:  David M Gordon; Olivier Clermont; Heather Tolley; Erick Denamur
Journal:  Environ Microbiol       Date:  2008-06-02       Impact factor: 5.491

9.  Relationship between phylogenetic groups, genotypic clusters, and virulence gene profiles of Escherichia coli strains from diverse human and animal sources.

Authors:  Satoshi Ishii; Katriya P Meyer; Michael J Sadowsky
Journal:  Appl Environ Microbiol       Date:  2007-07-20       Impact factor: 4.792

10.  The Clermont Escherichia coli phylo-typing method revisited: improvement of specificity and detection of new phylo-groups.

Authors:  Olivier Clermont; Julia K Christenson; Erick Denamur; David M Gordon
Journal:  Environ Microbiol Rep       Date:  2012-12-24       Impact factor: 3.541

View more
  1 in total

1.  Worldwide Phylogenetic Group Patterns of Escherichia coli from Commensal Human and Wastewater Treatment Plant Isolates.

Authors:  Nancy de Castro Stoppe; Juliana S Silva; Camila Carlos; Maria I Z Sato; Antonio M Saraiva; Laura M M Ottoboni; Tatiana T Torres
Journal:  Front Microbiol       Date:  2017-12-21       Impact factor: 5.640

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.