Literature DB >> 20566209

Novel application of a statistical technique, Random Forests, in a bacterial source tracking study.

Amanda Smith1, Blair Sterba-Boatwright, Joanna Mott.   

Abstract

In this study, data from bacterial source tracking (BST) analysis using antibiotic resistance profiles were examined using two statistical techniques, Random Forests (RF) and discriminant analysis (DA) to determine sources of fecal contamination of a Texas water body. Cow Trap and Cedar Lakes are potential oyster harvesting waters located in Brazoria County, Texas, that have been listed as impaired for bacteria on the 2004 Texas 303(d) list. Unknown source Escherichia coli were isolated from water samples collected in the study area during two sampling events. Isolates were confirmed as E. coli using carbon source utilization profiles and then analyzed via ARA, following the Kirby-Bauer disk diffusion method. Zone diameters from ARA profiles were analyzed with both DA and RF. Using a two-way classification (human vs nonhuman), both DA and RF categorized over 90% of the 299 unknown source isolates as a nonhuman source. The average rates of correct classification (ARCCs) for the library of 1172 isolates using DA and RF were 74.6% and 82.3%, respectively. ARCCs from RF ranged from 7.7 to 12.0% higher than those from DA. Rates of correct classification (RCCs) for individual sources classified with RF ranged from 23.2 to 0.2% higher than those of DA, with a mean difference of 9.0%. Additional evidence for the outperformance of DA by RF was found in the comparison of training and test set ARCCs and examination of specific disputed isolates; RF produced higher ARCCs (ranging from 8 to 13% higher) than DA for all 1000 trials (excluding the two-way classification, in which RF outperformed DA 999 out of 1000 times). This is of practical significance for analysis of bacterial source tracking data. Overall, based on both DA and RF results, migratory birds were found to be the source of the largest portion of the unknown E. coli isolates. This study is the first known published application of Random Forests in the field of BST. Copyright 2010 Elsevier Ltd. All rights reserved.

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 20566209     DOI: 10.1016/j.watres.2010.05.019

Source DB:  PubMed          Journal:  Water Res        ISSN: 0043-1354            Impact factor:   11.236


  14 in total

1.  Hydrometeorological variables predict fecal indicator bacteria densities in freshwater: data-driven methods for variable selection.

Authors:  Rachael M Jones; Li Liu; Samuel Dorevitch
Journal:  Environ Monit Assess       Date:  2012-06-27       Impact factor: 2.513

2.  Identification of source of faecal pollution of Tirumanimuttar River, Tamilnadu, India using microbial source tracking.

Authors:  Kasi Murugan; Perumal Prabhakaran; Saleh Al-Sohaibani; Kuppusamy Sekar
Journal:  Environ Monit Assess       Date:  2011-10-20       Impact factor: 2.513

Review 3.  Microbial source tracking using metagenomics and other new technologies.

Authors:  Shahbaz Raza; Jungman Kim; Michael J Sadowsky; Tatsuya Unno
Journal:  J Microbiol       Date:  2021-02-10       Impact factor: 3.422

Review 4.  Discovering new indicators of fecal pollution.

Authors:  Sandra L McLellan; A Murat Eren
Journal:  Trends Microbiol       Date:  2014-09-05       Impact factor: 17.079

Review 5.  Combined phylogenetic and genomic approaches for the high-throughput study of microbial habitat adaptation.

Authors:  Jesse R R Zaneveld; Laura Wegener Parfrey; Will Van Treuren; Catherine Lozupone; Jose C Clemente; Dan Knights; Jesse Stombaugh; Justin Kuczynski; Rob Knight
Journal:  Trends Microbiol       Date:  2011-08-25       Impact factor: 17.079

6.  Bayesian community-wide culture-independent microbial source tracking.

Authors:  Dan Knights; Justin Kuczynski; Emily S Charlson; Jesse Zaneveld; Michael C Mozer; Ronald G Collman; Frederic D Bushman; Rob Knight; Scott T Kelley
Journal:  Nat Methods       Date:  2011-07-17       Impact factor: 28.547

7.  Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests.

Authors:  João Maroco; Dina Silva; Ana Rodrigues; Manuela Guerreiro; Isabel Santana; Alexandre de Mendonça
Journal:  BMC Res Notes       Date:  2011-08-17

8.  Real-data comparison of data mining methods in prediction of diabetes in iran.

Authors:  Lily Tapak; Hossein Mahjub; Omid Hamidi; Jalal Poorolajal
Journal:  Healthc Inform Res       Date:  2013-09-30

9.  Tracking antibiotic resistance gene pollution from different sources using machine-learning classification.

Authors:  Li-Guan Li; Xiaole Yin; Tong Zhang
Journal:  Microbiome       Date:  2018-05-24       Impact factor: 14.650

10.  Fecal source identification using random forest.

Authors:  Adélaïde Roguet; A Murat Eren; Ryan J Newton; Sandra L McLellan
Journal:  Microbiome       Date:  2018-10-18       Impact factor: 14.650

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.