| Literature DB >> 14594360 |
John M Albert1, Junko Munakata-Marr, Luis Tenorio, Robert L Siegrist.
Abstract
Pattern recognition has been applied to environmental systems for identification of numerous pollution sources including aerosolized lead and petroleum hydrocarbons. In recent years, DNA fingerprinting has gained widespread application as a means to characterize genetic variations for such purposes as microbial source tracking. This approach, however, is strongly dependent on the statistical and image analyses applied. Several statistical analyses of rep-PCR DNA fingerprints were assessed as a means to differentiate between potential sources of fecal contamination. GelCompar II and methods based on penalized discriminant analysis (PDA) and k-nearest neighbors (KNN) classification procedures were used to differentiate between 10 source groups within a library containing DNA fingerprints of 548 Escherichia coli isolates from known human and nonhuman sources. KNN performed significantly better than PDA in a jackknife analysis, though the library was not large enough to detect significant differences between GelCompar II and the other two methods. GelCompar II and KNN both attained > or = 90% correct classification in a holdout procedure. In addition, interpoint distance analyses indicate coherency within source groups, while library randomization demonstrated that KNN does not create artificial groupings. This investigation stresses the need to understand limitations of statistical analyses used in pattern recognition of DNA fingerprints.Entities:
Mesh:
Substances:
Year: 2003 PMID: 14594360 DOI: 10.1021/es034211q
Source DB: PubMed Journal: Environ Sci Technol ISSN: 0013-936X Impact factor: 9.028