| Literature DB >> 26530830 |
Daniel McDonald1,2, Amanda Birmingham3, Rob Knight4,5.
Abstract
Human microbiome reference datasets provide epidemiological context for researchers, enabling them to uncover new insights into their own data through meta-analyses. In addition, large and comprehensive reference sets offer a means to develop or test hypotheses and can pave the way for addressing practical study design considerations such as sample size decisions. We discuss the importance of reference sets in human microbiome research, limitations of existing resources, technical challenges to employing reference sets, examples of their usage, and contributions of the American Gut Project to the development of a comprehensive reference set. Through engaging the general public, the American Gut Project aims to address many of the issues present in existing reference resources, characterizing health and disease, lifestyle, and dietary choices of the participants while extending its efforts globally through international collaborations.Entities:
Mesh:
Year: 2015 PMID: 26530830 PMCID: PMC4632476 DOI: 10.1186/s40168-015-0117-2
Source DB: PubMed Journal: Microbiome ISSN: 2049-2618 Impact factor: 14.650
A comparison of OTU-picking strategies
| Strategy | Pros | Cons | Data combination bias |
|---|---|---|---|
| Closed-reference | • Is extremely parallelizable | • Is limited to finding diversity present in OTU reference | • May show large bias if combining studies with differential representation in the reference |
| • Computes reference assignments only once | |||
| • Is highly unlikely to retain non-16S sequences | |||
| • Supports and reads fragments from multiple loci | |||
| • Gets the phylogeny and taxonomy for free | |||
| De novo | • Utilizes all of the sequences | • Must hold all sequence data in memory | • May generate spurious OTUs if combining studies with differential error profiles |
| • Requires no OTU database | • Is very complex to parallelize | ||
| • Can group organisms distinct from anything seen before | • Produces spurious OTUs without pre-filtering | ||
| • May produce phylogenies sensitive to subtle differences in OTUs | • Is infeasible if data are from multiple loci | ||
| • Must redo OTU picking with all data being combined | |||
| Open-reference | • Leverages an OTU database but also utilizes sequences that do not match to that database | • Produces spurious OTUs without pre-filtering | • Shows less bias due to differential diversity representation than closed-reference |
| • Is infeasible if data are from multiple loci | |||
| • Is modestly parallelizable | • Must redo OTU picking with all data being combined | ||
| • Shows less bias due to differential error profiles than de novo |