| Literature DB >> 33363701 |
Nicholas A Bokulich1, Michal Ziemski1, Michael S Robeson2, Benjamin D Kaehler3.
Abstract
Microbiomes are integral components of diverse ecosystems, and increasingly recognized for their roles in the health of humans, animals, plants, and other hosts. Given their complexity (both in composition and function), the effective study of microbiomes (microbiomics) relies on the development, optimization, and validation of computational methods for analyzing microbial datasets, such as from marker-gene (e.g., 16S rRNA gene) and metagenome data. This review describes best practices for benchmarking and implementing computational methods (and software) for studying microbiomes, with particular focus on unique characteristics of microbiomes and microbiomics data that should be taken into account when designing and testing microbiomics methods.Entities:
Keywords: Amplicon sequencing; Benchmarking; Best practices; Marker-gene sequencing; Metagenomics; Microbiome; Software development
Year: 2020 PMID: 33363701 PMCID: PMC7744638 DOI: 10.1016/j.csbj.2020.11.049
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1An overview of a microbiomics method development workflow. Typically, a method is developed to address one or more biological questions, e.g., concerning microbial composition (genetic or taxonomic), dynamics, or functional activity. Depending on the question, various data types can be used to feed a machine learning model or statistical test(s). Once developed, the method should be subject to a suite of benchmarks to assess its performance. A range of choices should be made here, as to what data to use, which performance metric to apply and what kind of benchmark to employ. Finally, to optimize accessibility for the research community, the method should be implemented into a software package/plugin, applying best practices for software development, including version control, testing and continuous integration, documentation and, finally, community support. Naturally, the three steps presented here may overlap in the development cycle, e.g. some part of benchmarking may be started already in the development phase. Software implementation also often starts early, with the first version of the working code. Generally, however, the transition from “develop” through “benchmark” to “implement” becomes natural as the project progresses.
Fig. 2Microbial diversity varies widely by sample type across the planet, as measured across 9787 samples from the Earth Microbiome Project [1]. A, Boxplots measuring the distribution of alpha diversity (as Shannon entropy) in each sample type (boxes show quartile values, diamonds indicate outlier values). B, Unweighted UniFrac principal coordinates analysis (PCoA) measures similarity between samples based on community-wide phylogenetic similarity. Samples are categorized by their “empo_3” sample type. Pre-computed data (Shannon diversity and PCoA coordinates) were collected from the published EMP study data at ftp://ftp.microbio.me/emp/.
A hypothetical confusion matrix that counts the number of samples that belong to three orders of Bacteria and how a classifier might have classified them.
| True Order | |||||
|---|---|---|---|---|---|
| Lactobacillales | Pseudomonadales | Enterobacteriales | Total | ||
| Predicted Order | Lactobacillales | 234 | 34 | 89 | 357 |
| Pseudomonadales | 56 | 142 | 21 | 219 | |
| Enterobacteriales | 78 | 11 | 68 | 157 | |
| Total | 368 | 187 | 178 | ||
Fig. 3An example of a benchmarking workflow for development of a new taxonomic classification method. Test data can be retrieved from multiple sources to obtain: (1) reference sequences for cross validation or simulation (e.g., using RESCRIPt [144]); (2) mock community data and known compositions (e.g., from mockrobiota [123]); and (3) biological data, e.g., microbiome sequence data from Qiita [142]. Data can either be classified directly to evaluate results (e.g., for mock community data, for which the true composition is known), or split into k-folds for cross-validation where at each iteration (k-1) folds are used for model training (represented by grey boxes) and the last fold (green box) is used to evaluate model performance. In the case of taxonomic classification, classification accuracy can be scored using metrics like F-measure. Resource utilization is also recorded and compared to the “Gold Standard” method of choice. If either of the metrics is unsatisfactory, the model can be optimized (e.g., via a grid search of parameter settings) and the process is repeated. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)