Literature DB >> 33397406

Unraveling city-specific signature and identifying sample origin locations for the data from CAMDA MetaSUB challenge.

Runzhi Zhang1, Alejandro R Walker2, Susmita Datta3.   

Abstract

BACKGROUND: Composition of microbial communities can be location-specific, and the different abundance of taxon within location could help us to unravel city-specific signature and predict the sample origin locations accurately. In this study, the whole genome shotgun (WGS) metagenomics data from samples across 16 cities around the world and samples from another 8 cities were provided as the main and mystery datasets respectively as the part of the CAMDA 2019 MetaSUB "Forensic Challenge". The feature selecting, normalization, three methods of machine learning, PCoA (Principal Coordinates Analysis) and ANCOM (Analysis of composition of microbiomes) were conducted for both the main and mystery datasets.
RESULTS: Features selecting, combined with the machines learning methods, revealed that the combination of the common features was effective for predicting the origin of the samples. The average error rates of 11.93 and 30.37% of three machine learning methods were obtained for main and mystery datasets respectively. Using the samples from main dataset to predict the labels of samples from mystery dataset, nearly 89.98% of the test samples could be correctly labeled as "mystery" samples. PCoA showed that nearly 60% of the total variability of the data could be explained by the first two PCoA axes. Although many cities overlapped, the separation of some cities was found in PCoA. The results of ANCOM, combined with importance score from the Random Forest, indicated that the common "family", "order" of the main-dataset and the common "order" of the mystery dataset provided the most efficient information for prediction respectively.
CONCLUSIONS: The results of the classification suggested that the composition of the microbiomes was distinctive across the cities, which could be used to identify the sample origins. This was also supported by the results from ANCOM and importance score from the RF. In addition, the accuracy of the prediction could be improved by more samples and better sequencing depth.

Entities:  

Keywords:  ANCOM; Linear discriminant analysis; Machine learning; Microbiome; OTU; PCoA; Random Forest; Support vector machine; WGS

Year:  2021        PMID: 33397406      PMCID: PMC7780616          DOI: 10.1186/s13062-020-00284-1

Source DB:  PubMed          Journal:  Biol Direct        ISSN: 1745-6150            Impact factor:   4.540


  12 in total

Review 1.  Metagenomic analyses: past and future trends.

Authors:  Carola Simon; Rolf Daniel
Journal:  Appl Environ Microbiol       Date:  2010-12-17       Impact factor: 4.792

2.  NGS QC Toolkit: a toolkit for quality control of next generation sequencing data.

Authors:  Ravi K Patel; Mukesh Jain
Journal:  PLoS One       Date:  2012-02-01       Impact factor: 3.240

3.  Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy.

Authors:  Qiong Wang; George M Garrity; James M Tiedje; James R Cole
Journal:  Appl Environ Microbiol       Date:  2007-06-22       Impact factor: 4.792

4.  A global atlas of the dominant bacteria found in soil.

Authors:  Manuel Delgado-Baquerizo; Angela M Oliverio; Tess E Brewer; Alberto Benavent-González; David J Eldridge; Richard D Bardgett; Fernando T Maestre; Brajesh K Singh; Noah Fierer
Journal:  Science       Date:  2018-01-19       Impact factor: 47.728

5.  limma powers differential expression analyses for RNA-sequencing and microarray studies.

Authors:  Matthew E Ritchie; Belinda Phipson; Di Wu; Yifang Hu; Charity W Law; Wei Shi; Gordon K Smyth
Journal:  Nucleic Acids Res       Date:  2015-01-20       Impact factor: 16.971

6.  Using QIIME to analyze 16S rRNA gene sequences from microbial communities.

Authors:  Justin Kuczynski; Jesse Stombaugh; William Anton Walters; Antonio González; J Gregory Caporaso; Rob Knight
Journal:  Curr Protoc Bioinformatics       Date:  2011-12

7.  Bias in random forest variable importance measures: illustrations, sources and a solution.

Authors:  Carolin Strobl; Anne-Laure Boulesteix; Achim Zeileis; Torsten Hothorn
Journal:  BMC Bioinformatics       Date:  2007-01-25       Impact factor: 3.169

8.  voom: Precision weights unlock linear model analysis tools for RNA-seq read counts.

Authors:  Charity W Law; Yunshun Chen; Wei Shi; Gordon K Smyth
Journal:  Genome Biol       Date:  2014-02-03       Impact factor: 13.583

9.  Unraveling bacterial fingerprints of city subways from microbiome 16S gene profiles.

Authors:  Alejandro R Walker; Tyler L Grimes; Somnath Datta; Susmita Datta
Journal:  Biol Direct       Date:  2018-05-22       Impact factor: 4.540

10.  Identification of city specific important bacterial signature for the MetaSUB CAMDA challenge microbiome data.

Authors:  Alejandro R Walker; Susmita Datta
Journal:  Biol Direct       Date:  2019-07-24       Impact factor: 4.540

View more
  3 in total

1.  Intestinal microbiota signatures of clinical response and immune-related adverse events in melanoma patients treated with anti-PD-1.

Authors:  John A McCulloch; Diwakar Davar; Richard R Rodrigues; Jonathan H Badger; Jennifer R Fang; Alicia M Cole; Ascharya K Balaji; Marie Vetizou; Stephanie M Prescott; Miriam R Fernandes; Raquel G F Costa; Wuxing Yuan; Rosalba Salcedo; Erol Bahadiroglu; Soumen Roy; Richelle N DeBlasio; Robert M Morrison; Joe-Marc Chauvin; Quanquan Ding; Bochra Zidi; Ava Lowin; Saranya Chakka; Wentao Gao; Ornella Pagliano; Scarlett J Ernst; Amy Rose; Nolan K Newman; Andrey Morgun; Hassane M Zarour; Giorgio Trinchieri; Amiran K Dzutsev
Journal:  Nat Med       Date:  2022-02-28       Impact factor: 87.241

Review 2.  Targeting the gut microbiota for cancer therapy.

Authors:  Miriam R Fernandes; Poonam Aggarwal; Raquel G F Costa; Alicia M Cole; Giorgio Trinchieri
Journal:  Nat Rev Cancer       Date:  2022-10-17       Impact factor: 69.800

3.  Metagenomic Geolocation Using Read Signatures.

Authors:  Timothy Chappell; Shlomo Geva; James M Hogan; David Lovell; Andrew Trotman; Dimitri Perrin
Journal:  Front Genet       Date:  2022-02-28       Impact factor: 4.599

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.