Literature DB >> 16705816

Application of oligonucleotide microarrays for bacterial source tracking of environmental Enterococcus sp. isolates.

Karl J Indest¹, Kelley Betts, John S Furey.

Abstract

In an effort towards adapting new and defensible methods for assessing and managing the risk posed by microbial pollution, we evaluated the utility of oligonucleotide microarrays for bacterial source tracking (BST) of environmental Enterococcus sp. isolates derived from various host sources. Current bacterial source tracking approaches rely on various phenotypic and genotypic methods to identify sources of bacterial contamination resulting from point or non-point pollution. For this study Enterococcus sp. isolates originating from deer, bovine, gull, and human sources were examined using microarrays. Isolates were subjected to Box PCR amplification and the resulting amplification products labeled with Cy5. Fluorescent-labeled templates were hybridized to in-house constructed nonamer oligonucleotide microarrays consisting of 198 probes. Microarray hybridization profiles were obtained using the ArrayPro image analysis software. Principal Components Analysis (PCA) and Hierarchical Cluster Analysis (HCA) were compared for their ability to visually cluster microarray hybridization profiles based on the environmental source from which the Enterococcus sp. isolates originated. The PCA was visually superior at separating origin-specific clusters, even for as few as 3 factors. A Soft Independent Modeling (SIM) classification confirmed the PCA, resulting in zero misclassifications using 5 factors for each class. The implication of these results for the application of random oligonucleotide microarrays for BST is that, given the reproducibility issues, factor-based variable selection such as in PCA and SIM greatly outperforms dendrogram-based similarity measures such as in HCA and K-Nearest Neighbor KNN.

Entities: Chemical Disease Species

Mesh：

Substances：

Year: 2005 PMID： 16705816 PMCID： PMC3814713 DOI： 10.3390/ijerph2005010175

Source DB: PubMed Journal: Int J Environ Res Public Health ISSN： 1660-4601 Impact factor: 3.390

Introduction

As the number of beach closings and advisories continue to rise, so does the public’s concern regarding microbial pollution in recreational waters. In a survey of more than 230 U.S. coastal and Great Lake communities, there were at least a total of 13,410 days of beach closings or advisories during 2001 [1]. The majority of beach closings and advisories were based on the presence of elevated levels of fecal contamination as measured by fecal bacterial indicators, such as Escherichia coli and Enterococci. Under section 303(d) of the 1972 Clean Water Act, states, territories, and authorized tribes are required to develop pollutant-specific lists of impaired waters and may be required to establish a total maximum daily load (TMDL) for those impaired waters [2]. TMDLs specify the maximum amount of a pollutant that a water body can receive and still meet water quality standards. Fecal coliforms are frequently listed as impairment on many states 303(d) list of associated water-quality impairments [3]. While TMDLs have historically focused on chemical impairments, more attention is now being focused on microbial impairments. Recently, the EPA published an extensive protocol for developing pathogen TDMLs [2]. Currently, there are several regional pilot projects underway aimed at establishing fecal coliform TMDLs for impacted watersheds [4]. Reducing the loads of fecal contamination can be problematic because often the pollution sources are not known or have non-point sources. Non-point sources of microbial fecal pollution are mobilized by rain/snow events and can include urban litter, agricultural runoff, failing sewer lines, malfunctioning septic systems, and domestic and wildlife excrement. Implementation of best management practices (BMPs) for TMDL compliance is dependent upon accurately identifying the source(s) of the impairment. Source tracking of non-point sources of microbial pollution, specifically indicator bacteria, has been generically referred to as bacterial source tracking (BST) [5] or microbial source tracking (MST) [6,7] and can be accomplished using a collection of multidisciplinary bacterial sub-typing methods. In addition to determining the origin of fecal contamination, BST methods can differentiate between human and non-human sources of microbial pollution [6,7], which can aid in generating more accurate risk assessments for managing the risk posed by microbial pollution. BST methods can be divided into two general groups, 1) phenotypic or biochemical-based methods, and 2) genotypic or molecular-based methods [7]. Of the phenotypic methods, multiple antibiotic resistance (MAR) analysis has been reported the most and has been shown to be successful in 1) discriminating human and animal sources of E. coli or fecal streptococci [8, 9, 10] and, 2) further discriminating animal sources by animal type [11]. This method involves isolating and culturing target indicator organisms from various sources and locations to create a reference library. These isolates are subsequently replica plated on selective media containing multiple antibiotics at a range of concentrations. Antibiotic susceptibilities are characterized, subjected to discriminant analysis and compared to a reference antibiotic susceptibility library to determine identity. Reliability of the method is determined by analyzing isolates as both standards and as unknowns. The number of isolates assigned to the correct categories divided by the total number of isolates is referred to as the average rate of correct classification (ARCC) [12]. ARCC values for this method range from 62% to 94% when individuals are compared. Despite the success of this method in simple watersheds [11], some researchers have indicated that MAR lacks the sensitivity, reproducibility, and host specificity that is needed for BST [13]. In contrast to the limited number of phenotypic sub-typing methods, numerous genotypic methods have been described including ribotyping [14, 15, 16], length heterogeneity polymerase chain reaction (LH-PCR), terminal restriction fragment length polymorphism (T-RFLP) PCR [17, 18], repetitive PCR (rep-PCR) [19], denaturing gradient gel electrophoresis (DGGE) [20, 21], pulsed-field gel electrophoresis (PFGE) [22, 23, 24], and amplified fragment length polymorphism (AFLP) [25]. Most of these molecular methods rely on PCR to interrogate a fraction of the target organisms’ available genetic information. PCR amplification products are subsequently resolved by gel-electrophoresis and the resulting banding pattern may be compared to a reference library to determine the identity of the organism. ARCC values can approach 100% when using some of these methods, such as rep-PCR [19]. Despite the success of genotypic methods, there is an ongoing need in BST for increased resolving power to discriminate between closely related microorganisms. Newer technologies, like DNA microarrays, which have been employed for various environmental microbiology applications [26], could potentially increase the resolving power of BST analysis [27]. For example, DNA microarrays interrogate DNA samples at the DNA sequence level. In contrast, gel-based methods rely on DNA fragment sizing; a method in which co-migration of heterogeneous DNA sequence populations of similar sized fragments is possible. Unlike gel-based methods, which rely on size fractionation of banding patterns that are subject to positional variation, DNA microarray profiles are comprised of physically immobilized, addressable spots. In addition, the resolving power of the microarray can be further improved by increasing the amount of oligonucleotide elements on the micro array. The methods and data analysis algorithms for the application of DNA microarrays towards BST are just starting to be developed. Recently, oligonucleotide microarrays were evaluated for their ability to differentiate 25 closely related Salmonella isolates [27]. Previously, the same authors used a similar microarray approach to discriminate closely related Xanthomonas pathovars [28]. In this study, we aim to build upon these findings and further the development of oligonucleotide microarrays for use in BST. Here we report the application of a microarray, consisting of 198 oligonucleotide elements, to discriminate 17 unique environmental isolates of Enterococcus sp. based on the host source of the bacteria.

Materials and Methods

Bacterial Isolates

A collection of 51 Enterococcus sp. isolates originating from bovine, deer, gull, and human sources were provided by Dr. Shiao Wang (University of Southern Mississippi; Hattiesburg, MS). Details of the isolation and characterization of these strains have been described in detail elsewhere [29]. Isolates were routinely propagated in brain heart infusion liquid media (Becton Dickenson, San Jose, CA). High molecular weight genomic DNA for PCR analysis was obtained from each isolate using Qiagen’s DNeasy Tissue Kit (Qiagen, Valencia, CA).

PCR Amplification and Labelling

PCR primer BOX A 1R 5′ CTA CGG CAA GGC GAC GCT GAC G 3′, was custom synthesized by Qiagen and targeted repetitive extragenic palindromic BOX sequences [19]. Primer BOX A 1R was used to amplify select portions of the Enterococcus sp. isolate genomes to be used as target DNAs for microarray analysis. All PCR reactions and their subsequent microarray analysis were carried out in triplicate. Final reaction conditions were as follows: 10mM Tris, pH 8.3, 50mM KCl2, 4.5mM MgCl2, 0.001 (w/v) gelatin, 0.2mM dNTP’s, 2μM BOX A 1R primer, and 5U Taq polymerase (Promega, Madison, WI) in a final reaction volume of 100μl. A total of 100ng of genomic DNA was used as template for each reaction. Amplification was carried out in a MJ Research Tetrad thermocycler (MJ Research, Inc., Waltham, MA) programmed as follows: initial step at 95°C for 2 min followed by 35 cycles of: 94°C for 3 sec, 92°C for 30 sec, 50°C for 60 sec, 65°C for 8 min and finally cooling to 4°C at the end of the last cycle. Ten microliter portions from each reaction were electrophoresed through a 1.0% agarose gel in 1x TAE (40mM Tris-Acetate, 1mM EDTA) running buffer and stained with Sybergold (Molecular Probes, Inc., Eugene, OR) for visualization to confirm amplification. The remaining portions of each amplification reaction were ethanol precipitated with sodium acetate [30] and the resulting air-dried DNA pellets were re-suspended in 20μl Millipore water. PCR products were aminoallyl(aa)-labeled as described previously [31]. Briefly, 3.3μl (3μg/μl) of random hexamers (Invitrogen, Carlsbad, CA) were added to each of the re-suspended PCR products and the final volume brought up to 39μl. The sample was heated to 100°C for five minutes and immediately placed in an ice bath. Twenty units of DNA polymerase I Klenow fragment (New England BioLabs, Beverly MA), 5μl of EcoPol (Klenow) buffer (New England Biolabs), and 2μl of 3mM dNTP/aa-labeling mix [100mM each dNTP, 50 mM aa-dUTP (Ambion, Austin TX)] were added to the reaction and the reaction was incubated at 37°C overnight. The reaction was stopped by adding 5μl of 0.5M EDTA. Unincorporated aa-dUTPs and free amines were removed from each reaction using the QIA quick PCR purification (Qiagen) kit with the following modifications: PE wash buffer was replaced with a 5 mM KPO4, 80% ethanol solution and elution buffer was replaced with a 4mM KPO4 solution. Purified PCR templates were dried down in a vacuum centrifuge and resuspended in 4.5μl of 0.1M Na2CO3 buffer, pH 9.3. DNA samples were labeled with a Cy5 dye by adding 4.5μl of a Cy5 mono-Reactive Dye Pack solution (Amersham Biosciences, Piscataway, NJ) and allowing the reaction to proceed in the dark at room temperature for two hours. The reaction was stopped by the addition of 35μl of 100mM NaOAc. Free dye was removed from the samples by using the QIA quick PCR purification kit (Qiagen) according to the manufacture’s instructions. DNA samples were dried down and immediately processed for microarray analysis.

Microarray Oligonucleotide Probes and Fabrication

One hundred ninety eight 9mers (Table 1), with an amine-modification at the 5′ end, (Sigmagenosys, Woodlands, TX) were randomly selected from a list of 102,403 9mer sequences that conform to criteria described previously [28]. Briefly, 9mer sequences had GC contents between 44–55%, could not have: 1) four nucleotide (or higher) repeats, 2) inverted repeats three nucleotides (or higher), 3) dual-terminal inverted repeats of 3 nucleotides (or higher), and 4) single-terminal inverted repeats of three nucleotides or higher. In addition to these criteria, all 9mer sequence combinations that occurred in Enterococcus sp. rRNA genes present in GeneBank as of 5/03 were eliminated. A Cy3-labeled control oligonucleotide, 5′ TTG GCA GAA GCT ATG AAA CGA TAT GGG 3′, with an amine-modification at the 5′ end, was used as a positional reference and hybridization control.

Table 1

Microarray Oligonucleotide Probes

ID No.	Sequences (5′-3′)
1	AAATACCCG
2	CAAATACCC
3	AATTGCCCT
4	GGGCCATTT
5	GACGAGCTT
6	AGCAGATAG
7	CTTTCCAGG
8	ATGACAGAC
9	TGAGAGGCT
10	GGTAGTGCT
11	CATTGTCCG
12	ATCTCTTGC
13	CTACCAAGG
14	AACACTACC
15	CCATAATCC
16	GAACTGGCA
17	CAAATCTGG
18	GCGATGTTG
19	AGAGAAGCC
20	TCAGCGCAT
21	GCAACCAAA
22	CTTGATTCC
23	TACCCACTG
24	TTACACCGC
25	CTGCGATCA
26	GAGCTGTCA
27	TGGGCGTTT
28	GGGCGTTTA
29	CATCTGTCG
30	AAGTAGCCC
31	AATATGCGG
32	GTACGGAGT
33	TCTGCTATG
34	CAAATGTCC
35	AAATCTCGC
36	AATTTCGGC
37	ACTCTCCCT
38	CCAAGTTCT
39	GAAAGAGCA
40	CCCTTTCCA
41	ACCTATGCG
42	TTGGGTTCG
43	ATACCGATG
44	TGCTTCACA
45	ACGCTACGA
46	TACTGTCGG
47	GCTGCTACA
48	TCCAACTAG
49	CCGCAAAGT
50	GATTAGCGC
51	GGATAGCGA
52	TATTGGTCG
53	AAGCAGCAG
54	CAGACACGA
55	AAAGTGCCC
56	CAATCGTTC
57	AATCCGTAG
58	CAAGAGGGT
59	AATGGAACC
60	CATATCCTC
61	AACTTGCCG
62	CATCTTGAC
63	AAGACAGTG
64	CACTACGCA
65	AAGGGATGA
66	CACGAATCC
67	ATATCACGG
68	CAGATGACC
69	ATAGTCCAG
70	CAGCAGATG
71	ATTCACACC
72	CAGGTGTGT
73	ATTGGTGGG
74	CTATACGCA
75	ATCAGGGAA
76	CTACACGCA
77	ATCGAGCCT
78	CTTATAGGG
79	ATGTCAAGG
80	CTTCCATAC
81	ATGCCGGTT
82	CTTGGAACC
83	ATGGACACC
84	CTCATAGGT
85	ATGGGTACG
86	CTCTGTTCC
87	ACATGACAG
88	CTCCTTTGC
89	ACACACCAT
90	CTCGATCAC
91	ACAGTCTCA
92	CTGAGTACA
93	ACTAAGCGC
94	CTGTAGACC
95	ACTTAGCCA
96	CTGCTACAC
97	ACTTCGTCG
98	CTGCTGTGT
99	ACTCTCTCT
100	CTGGCTTCT
101	TGGCTACGT
102	GGGCTGAAT
103	TGGCTCGAA
104	GGCCCATAT
105	TGCGTACAT
106	GGCTCAAGA
107	TGCCCAAGA
108	GGTTCTGTA
109	TGCAGAACG
110	GGTTTGTGT
111	TGCAAGTTC
112	GGTAGTTTC
113	TGTCTATCG
114	GGACCTAAC
115	TGAGGATAG
116	GGAAATCTG
117	TGATGAGAC
118	GCGATATTC
119	GCCAATGTT
120	TCGCCCTTA
121	TCGTTATGG
122	GCTTCCGTT
123	TCCGAGACT
124	GCTTGTGAT
125	TCCGTCAAG
126	GCTACCTTC
127	TCCTTGGTT
128	GCACTCTAA
129	TCCATCGTG
130	GCATGTAGG
131	TCTCGTACC
132	GTGGGCATT
133	GCAAAGCCT
134	TCTTCCTAC
135	GTGCTGGAT
136	TCTACCCAC
137	GTGTAGAAC
138	TCAGCATAG
139	GTGAGGTTC
140	TCAACCTTC
141	GTCACGTTA
142	TTGAGCTGA
143	GTTTCGTGT
144	TTCGCACTC
145	GTTAGGGTG
146	TTCTAGCGC
147	GTATCGCTA
148	TTAGCGTGC
149	GTAACTGTC
150	TTACCTGGC
151	GAGGAGATA
152	TAGCGAGTG
153	GAGTTTCAG
154	TAGTCGTCT
155	GAGAGAAAC
156	TAGCATAGG
157	GACTCTACG
158	TACGGTTCT
159	GACAGTTCA
160	TACCCAGTT
161	GATGATACC
163	GAAGGAAAG
164	TAAGCCGCA
165	GAACTAAGC
166	AGGCTGTTC
167	CGGTCAGAT
168	AGGTAGGAA
169	CGCCTATGT
170	AGCCGTACA
171	CGTGTTCTC
172	AGCTATGCG
173	CGTGGTTAT
174	AGCAAGTGT
175	CGTCTAACC
176	AGTCTCAAG
177	CGAAGTTTG
179	CGACTGGAA
180	AGTTACCCT
181	CGAAACAGG
182	AGAGTTCGA
183	CCGTGGAAA
184	CACAACTCT
185	AACGAAACG
186	AACACGCTT
187	CATGAGGTT
188	CATAGCGAA
189	CAAACGAGG
190	AAACACGTC
191	CTGCTACGA
192	ACTAGCGGT
193	ACAACACTC
194	CTATGTCGG
195	CTATCAACC
196	AACGATACC
197	CAAACGGGA
198	CTGTCACTG

5′ amine-modified 9mer oligonuceotide microarray probes and corresponding I.D. numbers

Microarrays were fabricated on aldehyde-coated glass microscope slides (Telechem International, Inc., Sunnyvale, CA) using the BioRad VersArray ChipWriter (BioRad, Hercules, CA) equipped with SMP3 Stealth microspotting pins (Telechem Internation, Inc.). Prior to fabrication, amine-modified oligonucleotides were transferred to a 384-well plate (Whatman, Clifton, NJ) and diluted to a concentration of 80 μM in 50% dimethyl sulfoxide (DMSO). Probes were printed in duplicate, using a 2-pin configuration, at a relative humidity of 60%. The resulting grid pattern and corresponding oligonucleotide probe location is illustrated in Fig. 1. After printing, slides were baked for 45 minutes at 80°C, briefly washed with 0.2% SDS, and subsequently rinsed with reagent grade water. Free aldehyde groups were chemically blocked by soaking printed slides in a fresh NaBH4 solution [0.75g NaBH4 (Sigma, MO), 225 ml phosphate buffered saline (pH 7.0), 66.5ml 100% ethanol] for five minutes. Following chemical blocking, printed slides were momentarily dipped 3 times in 0.2% SDS, washed for one minute in reagent grade water, and individually spun dried in 50ml Falcon conical tubes (Fisher Scientific, MO) at 700rpm for 10 minutes in a tabletop centrifuge. Microarray substrates were stored at room temperature in a desiccator.

Figure 1

Configuration of the printed microarray spots and the physical location of the corresponding oligonucleotide probes as referenced in Table 1. Control oligonucleotide designated by C.

Microarray Hybridization

Prior to hybridization, printed slides were pre-hybridized in 0.1% SDS, 4X SSC (1X SSC, 0.15M NaCl, 0.015M trisodium citrate, pH 7.0), and 10mg/ml bovine serum albumin (BSA) in 50ml Falcon conical tubes at 40°C with slight agitation for 2 hours. Pre-hybridized slides were rinsed 5 times in reagent grade distilled water and chilled to 4°C on a solid metal platform. Cy5 aminoallyl-labelled DNA targets were resuspended in 15μl of 4X SSC, heated at 95°C for 5 min, and immediately placed on ice. The Cy3 labelled oligonucleotide, 5′CCC ATA TCG TTT CAT AGC TTC TGC CA 3′, was also included in the hybridization reaction (final concentration 0.6μM) as a control to hybridize with the control oligonucleotide attached to the microarray. Chilled hybridization reactions were pipetted on prechilled printed microarray slides, covered with array cover slips (PGC Scientifics, Gaitherburg, MD), and incubated overnight at 4°C as described previously [28]. Hybridized microarrays were gently rinsed in 4°C 4X SSC 5 times for 1 minute intervals followed by a final 30 second rinse in reagent grade water. Microarray slides were spun dried in 50 ml conical tubes as described above prior to scanning slides.

Image Analysis and Statistics

Processed microarray slides were scanned at 532nm and 635 nm using the VersArray Chipreader system (BioRad, Hercules, CA) configured at a 5μm resolution. Spot intensity data from the resulting 16-bit TIF images were initially extracted using the ArrayPro Analyzer software (Media Cybernetics, Silver Spring, MD). Background signal was determined locally for each spot using the “local corners” option. Individual spot intensities, minus local backgrounds, were normalized to total spot intensity for all of the spots on each micro array. The mean-normalized datasets were transformed by taking the logarithm of these values. An empirical data reduction process was employed (see Results) to identify which of the 198 probe spots had the most information (example: spots that were always “on” or “off” for all isolates would have no information for this dataset) and which of the spots that were too variable within the replicates of the same isolates. Principal Components Analyses (PCA) and cluster and classification analyses were run on the remaining dataset using Pirouette (Infometrix, Inc., Bothell, WA).

Results

Oligonucleotide Microarray Bacterial Source Tracking

Oligonucleotide microarrays were evaluated for their ability to resolve BOX PCR amplification products derived from environmental sources of Enterococcus sp. isolates originating from deer, bovine, gull, and human. Purified genomic DNA from Enterococcus sp. isolates was subjected to BOX PCR amplification and the resulting amplification products were visualized by agarose gel electrophoresis. The results of a typical experiment can be seen in Fig. 2, which represents the subset of samples originating from deer. Agar gel electrophoresis confirms amplification as well as consistency of the BOX PCR reaction. PCR products were fluorescently labelled with aminoallyl dUTP and Cy5 then resolved by hybridization to in house fabricated 9mer oligonucleotide microarrays (see Material & Methods). The results of a representative microarray experiment can be seen in Fig. 3, in which replicate BOX PCR reactions from Enterococcus sp. deer isolate 49.1.1 were hybridized to replicate oligonucleotide micro arrays. A histogram of fluorescent spot intensities indicates that these randomly selected nonamer intensities follow a lognormal distribution (data not shown). Of the 17 environmental isolates analysed, not all replicate microarrays were usable. For six of these isolates (4 human and 2 deer) a single microarray hybridization replicate, consisting of duplicate microarray spots, was available for analysis. For the remaining 11 isolates and their replicates, spots that exhibited extreme variability in normalized spot intensities among replicates within a specific source were identified and subsequently eliminated from analysis for all isolates. Normalized spot intensities with above median standard deviations > or = 0.7 within source-specific datasets (i.e. bovine, deer, etc.), were eliminated leaving 45 of the 200 probes. The remaining 45 probes were then used for analyzing all 17 isolates.

Figure 2

BOX-PCR agarose gel fingerprints run in triplicate from Enterococcus sp. isolates originating from deer. A HindIII digested Lambda marker was included in the gel run as a size standard

Figure 3

Oligonucleotide microarray replicate hybridization profiles resulting from hybridization with BOX-PCR amplification products from deer isolate 49.1.1.

PCA and HCA Analysis

The dendrogram of a complete Euclidean distance Hierarchical Cluster Analysis (HCA) did not project good origin-specific clustering of the isolates. In particular, the bovine-origin replicates were spread among several clusters (example part of dendrogram Fig. 4). A K-Nearest Neighbour classification confirmed the HCA, misclassifying 8% of the deer, 16% of the human, and 50% of the gull isolates as bovine isolates. The PCA was visually superior at separating origin-specific clusters, even for as few as 3 factors (Fig. 5). A Soft Independent Modelling (SIM) classification confirmed the PCA, resulting in zero misclassifications using 5 factors for each class. Numerical descriptions of the SIM classification model for bovine-origin Enterococcus sp. are presented in Table II. These factors describe the multidimensional subspace within the PCA projection in which the various microarray source profiles exist. Factor numbers indicate the relative linear weights of each probe in each factor. For instance probes 2 and 16 have the highest weights for the most important factor, Factor 1, which accounts for 30% of the variability. Thus for this set of isolates, SIM classifications based on 5 factors for each class and 5 linear combinations of the 45 probes sufficed to distinguish the origins of Enterococcus sp. isolates.

Figure 4

Hierarchical Cluster Analysis of normalized microarray spot intensities of replicates of 17 environmental isolates of Enterococcus sp. The dendrogram does not show good clustering by host origin at reasonable similarities. The bovine-origin replicates were most spread.

Figure 5

Principal Components Analysis of normalized microarray spot intensities of replicates of 17 environmental isolates of Enterococcus sp., colored by host origin: deer is red, bovine is yellow, human is green, gull is purple. For this 3D view only the first 3 components can be plotted, but clustering is evident.

Table 2

The 5-factor oligonucleotide microarray SIM classification model for bovine-origin Enterococcus

probe I.D.#	Factor1	Factor2	Factor3	Factor4	Factor5
10	−0.0693	−0.0124	−0.1724	0.3365	0.1906
101	−0.2427	−0.0110	0.1314	−0.0107	−0.0573
103	−0.0893	−0.0051	0.1625	0.0274	0.0169
109	0.0946	−0.1203	0.0935	−0.2996	0.2562
110	0.1979	0.1221	0.0276	0.1051	−0.0005
116	0.1914	0.0207	0.1926	0.0159	0.1193
120	−0.0518	0.0044	0.2388	0.1803	0.1916
129	−0.0098	−0.1774	−0.2056	0.1891	0.0237
135	0.2337	−0.0282	−0.0125	0.1349	0.1671
139	−0.0159	0.1796	−0.0037	−0.0411	0.2386
143	−0.0029	−0.1737	−0.0814	−0.1052	−0.2598
148	−0.1357	0.0204	0.2510	0.1294	0.0086
151	0.0471	−0.0152	−0.0363	0.3958	0.1300
152	−0.0195	−0.2508	0.1577	0.0328	−0.1484
156	−0.1935	−0.0510	0.1517	0.0772	−0.0967
16	0.2880	−0.0279	0.0481	0.0090	0.0807
163	−0.1135	−0.1457	0.1707	0.2306	−0.0090
164	−0.0643	0.1756	0.1276	−0.2373	0.1324
173	−0.0276	−0.0829	−0.1689	−0.1325	0.2091
179	0.1752	0.1325	0.2503	0.0267	0.0454
183	−0.1061	−0.0186	0.1716	0.3182	0.1273
188	0.0372	−0.1136	0.2749	−0.0896	−0.1253
197	0.1127	0.1055	0.1382	0.0709	−0.2228
2	0.3161	−0.1013	−0.0028	0.0365	−0.0345
23	0.0120	0.0386	0.1942	−0.2457	0.0996
24	0.0043	−0.1837	0.1502	−0.0565	−0.1489
27	0.0415	0.2624	0.0418	0.1381	−0.0770
3	−0.1492	−0.1004	0.1142	0.0727	−0.1953
31	−0.1750	−0.0222	0.1739	−0.0174	−0.0754
39	0.0871	−0.2652	0.0902	0.1966	0.0277
42	−0.0778	−0.0590	0.1760	−0.0517	0.2568
43	0.0425	−0.2962	−0.0494	0.1284	0.0829
51	0.2070	−0.0761	0.0827	0.1584	−0.0281
52	0.1503	0.2894	−0.0144	0.0820	−0.1347
54	−0.1305	0.1739	0.0558	0.0927	0.2616
61	−0.0134	0.1594	0.3046	−0.0007	0.0549
63	0.2623	−0.1179	0.0312	0.0517	0.1014
65	0.2654	0.1653	0.0571	−0.0874	−0.0411
67	0.1681	−0.2348	−0.0132	−0.1448	0.1331
7	0.2416	−0.2058	0.0358	−0.0196	−0.0125
72	−0.0466	0.1237	−0.3268	0.0042	0.1066
74	0.1058	0.2964	0.0585	0.1051	−0.1957
76	−0.0709	−0.1606	0.0171	−0.1116	−0.1814
85	0.2414	0.0085	0.1413	−0.0517	−0.0178
97	−0.1241	−0.0890	0.1126	−0.1321	0.3621

Discussion

In an effort towards adapting new defensible methods for assessing and managing the risk posed by microbial pollution, we evaluated the utility of oligonucleotide microarrays for bacterial source tracking. Specifically, we evaluated the ability of oligonucleotide microarrays to visually discriminate 17 unique environmental isolates of Enterococcus sp. based on host origin, i.e. gull, bovine, deer, and human. As observed in an earlier study by Kingsley et al. [28], many of the microarray oligonucleotide probes exhibited high variations in fluorescent spot intensities within a series of replicates. A strong down selection for reproducible spot intensities within replicates produced a set of 45 probes, and this reduced set proved useful for classifying isolates by source. It should be reiterated that this data reduction was performed in order to improve reproducibility, and had the side effect of improving the classification fit. This is the opposite of the familiar problem of model over fitting, in which the addition of extra variables improves classification at the expense of robustness and reproducibility. Following data reduction, a number of multivariate statistical analysis procedures are available for evaluating the relationships among microarray hybridization profiles. Previously, PCA was successfully used to visualize relationships among microarray hybridization profiles derived from closely related Xanthomonas pathovars [28]. In this study, PCA and HCA were compared for their ability to visually cluster microarray hybridization profiles based on the environmental source from which the Enterococcus sp. isolate originated. Classification of Enterococcus sp. isolates by source using a Soft Independent Modelling of class analogies consisting of 5 factors was more accurate than classification based on K-Nearest Neighbour calculations. This difference is apparent when comparing the PCA, which is a visualization of some of the SIM calculations, to the HCA, which is a visualization of some of the KNN calculations. The implication of these results for the application of random oligonucleotide microarrays for BST is that, given the reproducibility issues, factor-based variable selection such as in PCA and SIM greatly outperforms dendrogram-based similarity measures such as in HCA and KNN. Given any sample based strictly on the microarray intensity values, the SIM model outputs the best fitting class for that sample, with zero misclassifications for the dataset. Further optimization of source classifications may result from the application of information theory to detect patterns in microarray profiles. In particular, bacterial source tracking may benefit from several measures of classification utility, such as those based on mutual information that have been developed as part of information theory [32]. However, successful application of information theory for microarray analysis will be dependant upon accurately understanding, capturing, and modelling sources of variation in the microarray experimental process. Some of these sources of variation, such as PCR amplification and microarray fabrication have been described previously [27]. Once improved microarray experimental protocols and statistical methods have been developed, it will be possible to incorporate microarray technology into the growing toolbox of technologies that is rapidly defining bacterial source tracking. While there is currently no one best method that accomplishes the ambitious goal of source tracking as demonstrated in the latest study by Stoeckel et al. [33], it is likely that a combination of methods will lead to effective source tracking.

24 in total

1. Determining sources of fecal pollution in a rural Virginia watershed with antibiotic resistance patterns in fecal streptococci.

Authors: C Hagedorn; S L Robinson; J R Filtz; S M Grubbs; T A Angier; R B Reneau
Journal: Appl Environ Microbiol Date: 1999-12 Impact factor: 4.792

2. Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements.

Authors: A J Butte; I S Kohane
Journal: Pac Symp Biocomput Date: 2000

3. Classification of antibiotic resistance patterns of indicator bacteria by discriminant analysis: use in predicting the source of fecal contamination in subtropical waters.

Authors: V J Harwood; J Whitlock; V Withington
Journal: Appl Environ Microbiol Date: 2000-09 Impact factor: 4.792

4. Comparison of seven protocols to identify fecal contamination sources using Escherichia coli.

Authors: Donald M Stoeckel; Melvin V Mathes; Kenneth E Hyer; Charles Hagedorn; Howard Kator; Jerzy Lukasik; Tara L O'Brien; Terry W Fenger; Mansour Samadpour; Kriston M Strickler; Bruce A Wiggins
Journal: Environ Sci Technol Date: 2004-11-15 Impact factor: 9.028

5. Simultaneous detection and differentiation of Escherichia coli populations from environmental freshwaters by means of sequence variations in a fragment of the beta-D-glucuronidase gene.

Authors: A H Farnleitner; N Kreuzinger; G G Kavka; S Grillenberger; J Rath; R L Mach
Journal: Appl Environ Microbiol Date: 2000-04 Impact factor: 4.792

6. Antibiotic resistance and genotypic characterization by PFGE of clinical and environmental isolates of enterococci.

Authors: G Dicuonzo; G Gherardi; G Lorino; S Angeletti; F Battistoni; L Bertuccini; R Creti; R Di Rosa; M Venditti; L Baldassarri
Journal: FEMS Microbiol Lett Date: 2001-07-24 Impact factor: 2.742

7. Discriminant analysis of ribotype profiles of Escherichia coli for differentiating human and nonhuman sources of fecal pollution.

Authors: S Parveen; K M Portier; K Robinson; L Edmiston; M L Tamplin
Journal: Appl Environ Microbiol Date: 1999-07 Impact factor: 4.792