Literature DB >> 20805793

Rapidly denoising pyrosequencing amplicon reads by exploiting rank-abundance distributions.

Jens Reeder, Rob Knight.   

Abstract

Entities:  

Mesh:

Year:  2010        PMID: 20805793      PMCID: PMC2945879          DOI: 10.1038/nmeth0910-668b

Source DB:  PubMed          Journal:  Nat Methods        ISSN: 1548-7091            Impact factor:   28.547


× No keyword cloud information.
Pyrosequencing1 has revolutionized microbial community analysis by allowing the simultaneous assessment of hundreds of microbial communities in multiplex with sufficient depth to resolve meaningful biological patterns2. These techniques have been used to gain striking new insight into microbial processes on scales ranging from continents3 to within an individual’s body4. Although powerful new analysis tools such as GAST5, Mothur6, and QIIME7 greatly streamline the process of interpreting microbial community information obtained by pyrosequencing, especially similarities and differences among communities, substantial questions remain about the suitability of pyrosequencing to address questions concerning alpha diversity, the amount of diversity within each individual community and non-phylogenetic beta-diversity measures (phylogenetic beta-diversity measures such as UniFrac, which measure similarities between different communities, are relatively robust to these issues8). In particular, noise introduced during pyrosequencing and the PCR amplification stage can inflate estimates of the number of OTUs (chosen at the 97% identity level) in a given habitat by orders of magnitude9, 10. The current state-of-the-art is to reduce noise by clustering the flowgrams (patterns of intensities in each read) before conversion to sequences to eliminate issues due to homopolymer read errors10, yet this approach is exceedingly computationally expensive and beyond the reach of most individual investigators who do not have access to large-scale computing facilities.

Methods

Inability to accurately determine which sequences are present in a sample, and hence the abundances of rare taxa, greatly inhibits our ability to infer important ecological parameters such as rank-abundance curves, yet ironically the portion of the rank-abundance curve that can be inferred, i.e. of the common taxa, provides a solution to the conundrum of the expense of denoising. Empirical rank-abundance curves, especially from human-associated samples, tend to be dominated by a relatively small number of abundant taxa. Given this feature of actual microbial communities, performing all-on-all comparisons for clustering is exceedingly inefficient: instead, a subset of reads suffices to identify the common OTUs, which can then be iteratively removed by recruitment to an existing cluster. Consequently, we can rapidly determine the OTUs that are most likely to be abundant, concentrate initially on comparing reads to the small number of abundant OTUs (removing matches from the analysis), and then cluster only the leftover reads representing more divergent sequences. We can thus reduce the total number of sequence comparisons using empirical features of the abundance distribution of real datasets as follows. First, we devised a fast pre-filter, removing reads that are strict prefixes of other reads, and compute an initial sequence distribution. We then sort the prefix clusters in descending order of abundance, and use this initial distribution to cluster similar reads, comparing each additional unclustered read to the most abundant clusters first because we expect the abundant clusters to yield a larger number of erroneous near-matching reads due to their numerical dominance alone. For a more detailed description of the algorithm, see Supplementary Methods. A similar method of pre-clustering on the sequence level and subsequent sequence clustering along the abundance distribution has been proposed recently11. The method introduced here is a major improvement over previous flowgram-based denoising routines10 in terms of compute resources, yet retains the advantage that singletons are not discarded entirely, allowing exploration of the rare biosphere12. Previously, a mid-size 24-core cluster was needed to analyze a small dataset of around 40,000 sequences in around 10 hours. Our method allows the same dataset to be denoised in less than an hour on a single laptop computer (Table S1). We can also denoise full 454 runs with 500,000 sequences on a mid-size cluster in 1 day. We can thus address questions in community ecology that were previously intractable. Applying these new methods to the most comprehensive survey of human-associated body habitats yet performed4, we find that denoising produces a substantial decrease in the diversity both at the OTU level and in terms of the phylogenetic diversity (the total branch length associated with each sample on a phylogenetic tree14). However, the results from the non-denoised (but filtered) and denoised data are highly correlated (r2 = 0.97, P <10−300 for phylogenetic diversity), suggesting that relative results concerning diversity within each sample are robust to the types of errors introduced by pyrosequencing (Fig. 1a–f). Interestingly, in spite of this high correlation, denoising changes the relative order of OTU richness of individual body habitats. Although the gut exhibits the highest OTU richness without denoising, it falls back into the middle ranks after denoising. This holds true for both Chao1 estimates and the phylogenetic diversity (Fig. 1a,d and 1b,e). The drastic reduction after denoising might be an effect of the sequence composition of the dominant OTUs in the gut (see Supplementary Methods for a more detailed discussion).
Figure 1

Comparisons of non-denoised data (a–c) to denoised data (d–f) for alpha diversity for the Body Habitat study, and comparisons of beta diversity (g–h). Rarefaction plots of the “Body Habitat” study4 show a 3 to 4 fold decrease in the Chao1 estimate when comparing non-denoised (a) to denoised (b) data. Interestingly, denoising changes the relative order of OTU richness of individual body habitats: the gut exhibits the highest OTU richness without denoising, but falls back into the middle ranks after denoising. This holds true for both Chao1 estimates and phylogenetic diversity (PD). c) Scatter plots of alpha diversity metrics per sample show a high correlation overall, but a significant deviation from the average for gut and the oral cavity. (EAC = external auditory canal). g) Procrustes analysis of denoised and filtered unweighted UniFrac principal coordinates analysis (PCoA). Bars connect identical samples in the plot with the red side of the bar pointing towards the denoised data. There is no qualitative difference between denoised and filtered in the overall clustering, yet on a smaller scale we observe that the denoised samples are oriented more to the center than the filtered ones. This shows that denoising removes some of the artificial distance between samples introduced by false OTUs. h) Unweighted UniFrac distances for all pairs of samples for the denoised and filtered data set are highly correlated (r2=0.96). From the regression, it is clear that for similar samples noise has a greater effect than it has for dissimilar samples. The color bar gives the number of pairwise comparisons at a particular point.

Similarly, when clustering the samples using UniFrac, the non-denoised and denoised reads produce very similar patterns (Fig. 1g–h), reinforcing the point that errors introduced into each sample by noise or chimeras have little effect on beta diversity because they inflate the distances among all samples rather than introducing artifactual similarities between specific pairs of samples15. We conclude that the availability of these new methods will make more accurate assessments of alpha diversity available to a wide range of researchers (especially in conjunction with improved chimera-checking methods such as ChimeraSlayer, http://microbiomeutil.sourceforge.net/), and will greatly improve our understanding of microbial communities in habitats with scales ranging from global to extremely personal. The efficiency of the new techniques and the fact that they can change conclusions about the relative diversity in different habitats suggests that they should be applied routinely in all pyrosequencing studies where estimates of diversity within each sample are the goal.
  17 in total

1.  Genome sequencing in microfabricated high-density picolitre reactors.

Authors:  Marcel Margulies; Michael Egholm; William E Altman; Said Attiya; Joel S Bader; Lisa A Bemben; Jan Berka; Michael S Braverman; Yi-Ju Chen; Zhoutao Chen; Scott B Dewell; Lei Du; Joseph M Fierro; Xavier V Gomes; Brian C Godwin; Wen He; Scott Helgesen; Chun Heen Ho; Chun He Ho; Gerard P Irzyk; Szilveszter C Jando; Maria L I Alenquer; Thomas P Jarvie; Kshama B Jirage; Jong-Bum Kim; James R Knight; Janna R Lanza; John H Leamon; Steven M Lefkowitz; Ming Lei; Jing Li; Kenton L Lohman; Hong Lu; Vinod B Makhijani; Keith E McDade; Michael P McKenna; Eugene W Myers; Elizabeth Nickerson; John R Nobile; Ramona Plant; Bernard P Puc; Michael T Ronan; George T Roth; Gary J Sarkis; Jan Fredrik Simons; John W Simpson; Maithreyan Srinivasan; Karrie R Tartaro; Alexander Tomasz; Kari A Vogt; Greg A Volkmer; Shally H Wang; Yong Wang; Michael P Weiner; Pengguang Yu; Richard F Begley; Jonathan M Rothberg
Journal:  Nature       Date:  2005-07-31       Impact factor: 49.962

2.  Pyrosequencing-based assessment of soil pH as a predictor of soil bacterial community structure at the continental scale.

Authors:  Christian L Lauber; Micah Hamady; Rob Knight; Noah Fierer
Journal:  Appl Environ Microbiol       Date:  2009-06-05       Impact factor: 4.792

3.  Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates.

Authors:  Victor Kunin; Anna Engelbrektson; Howard Ochman; Philip Hugenholtz
Journal:  Environ Microbiol       Date:  2009-08-27       Impact factor: 5.491

4.  PyNAST: a flexible tool for aligning sequences to a template alignment.

Authors:  J Gregory Caporaso; Kyle Bittinger; Frederic D Bushman; Todd Z DeSantis; Gary L Andersen; Rob Knight
Journal:  Bioinformatics       Date:  2009-11-13       Impact factor: 6.937

5.  Ironing out the wrinkles in the rare biosphere through improved OTU clustering.

Authors:  Susan M Huse; David Mark Welch; Hilary G Morrison; Mitchell L Sogin
Journal:  Environ Microbiol       Date:  2010-03-11       Impact factor: 5.491

6.  Bacterial community variation in human body habitats across space and time.

Authors:  Elizabeth K Costello; Christian L Lauber; Micah Hamady; Noah Fierer; Jeffrey I Gordon; Rob Knight
Journal:  Science       Date:  2009-11-05       Impact factor: 47.728

7.  Evolution of mammals and their gut microbes.

Authors:  Ruth E Ley; Micah Hamady; Catherine Lozupone; Peter J Turnbaugh; Rob Roy Ramey; J Stephen Bircher; Michael L Schlegel; Tammy A Tucker; Mark D Schrenzel; Rob Knight; Jeffrey I Gordon
Journal:  Science       Date:  2008-05-22       Impact factor: 47.728

8.  A core gut microbiome in obese and lean twins.

Authors:  Peter J Turnbaugh; Micah Hamady; Tanya Yatsunenko; Brandi L Cantarel; Alexis Duncan; Ruth E Ley; Mitchell L Sogin; William J Jones; Bruce A Roe; Jason P Affourtit; Michael Egholm; Bernard Henrissat; Andrew C Heath; Rob Knight; Jeffrey I Gordon
Journal:  Nature       Date:  2008-11-30       Impact factor: 49.962

9.  Accuracy and quality of massively parallel DNA pyrosequencing.

Authors:  Susan M Huse; Julie A Huber; Hilary G Morrison; Mitchell L Sogin; David Mark Welch
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

10.  Exploring microbial diversity and taxonomy using SSU rRNA hypervariable tag sequencing.

Authors:  Susan M Huse; Les Dethlefsen; Julie A Huber; David Mark Welch; David Mark Welch; David A Relman; Mitchell L Sogin
Journal:  PLoS Genet       Date:  2008-11-21       Impact factor: 5.917

View more
  310 in total

1.  Seasonal patterns in Arctic prasinophytes and inferred ecology of Bathycoccus unveiled in an Arctic winter metagenome.

Authors:  Nathalie Joli; Adam Monier; Ramiro Logares; Connie Lovejoy
Journal:  ISME J       Date:  2017-03-07       Impact factor: 10.302

2.  Fast, accurate error-correction of amplicon pyrosequences using Acacia.

Authors:  Lauren Bragg; Glenn Stone; Michael Imelfort; Philip Hugenholtz; Gene W Tyson
Journal:  Nat Methods       Date:  2012-04-27       Impact factor: 28.547

3.  In-depth characterization via complementing culture-independent approaches of the microbial community in an acidic hot spring of the Colombian Andes.

Authors:  Laura C Bohorquez; Luisa Delgado-Serrano; Gina López; César Osorio-Forero; Vanja Klepac-Ceraj; Roberto Kolter; Howard Junca; Sandra Baena; María Mercedes Zambrano
Journal:  Microb Ecol       Date:  2011-09-27       Impact factor: 4.552

4.  The effects from DNA extraction methods on the evaluation of microbial diversity associated with human colonic tissue.

Authors:  Páraic Ó Cuív; Daniel Aguirre de Cárcer; Michelle Jones; Eline S Klaassens; Daniel L Worthley; Vicki L J Whitehall; Seungha Kang; Christopher S McSweeney; Barbara A Leggett; Mark Morrison
Journal:  Microb Ecol       Date:  2010-12-14       Impact factor: 4.552

5.  Impact of Nisin-Activated Packaging on Microbiota of Beef Burgers during Storage.

Authors:  Ilario Ferrocino; Anna Greppi; Antonietta La Storia; Kalliopi Rantsiou; Danilo Ercolini; Luca Cocolin
Journal:  Appl Environ Microbiol       Date:  2015-11-06       Impact factor: 4.792

6.  Airborne and Grain Dust Fungal Community Compositions Are Shaped Regionally by Plant Genotypes and Farming Practices.

Authors:  Loïc Pellissier; Anne Oppliger; Alexandre H Hirzel; Dessislava Savova-Bianchi; Guilain Mbayo; Fabio Mascher; Stefan Kellenberger; Hélène Niculita-Hirzel
Journal:  Appl Environ Microbiol       Date:  2016-01-29       Impact factor: 4.792

7.  Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities.

Authors:  Migun Shakya; Christopher Quince; James H Campbell; Zamin K Yang; Christopher W Schadt; Mircea Podar
Journal:  Environ Microbiol       Date:  2013-02-06       Impact factor: 5.491

8.  Analysis of the Fungal Diversity in Citrus Leaves with Greasy Spot Disease Symptoms.

Authors:  Ahmed Abdelfattah; Santa O Cacciola; Saveria Mosca; Rocco Zappia; Leonardo Schena
Journal:  Microb Ecol       Date:  2016-10-18       Impact factor: 4.552

9.  Polyphenol-rich sorghum brans alter colon microbiota and impact species diversity and species richness after multiple bouts of dextran sodium sulfate-induced colitis.

Authors:  Lauren E Ritchie; Joseph M Sturino; Raymond J Carroll; Lloyd W Rooney; M Andrea Azcarate-Peril; Nancy D Turner
Journal:  FEMS Microbiol Ecol       Date:  2015-01-14       Impact factor: 4.194

Review 10.  Ancient and modern environmental DNA.

Authors:  Mikkel Winther Pedersen; Søren Overballe-Petersen; Luca Ermini; Clio Der Sarkissian; James Haile; Micaela Hellstrom; Johan Spens; Philip Francis Thomsen; Kristine Bohmann; Enrico Cappellini; Ida Bærholm Schnell; Nathan A Wales; Christian Carøe; Paula F Campos; Astrid M Z Schmidt; M Thomas P Gilbert; Anders J Hansen; Ludovic Orlando; Eske Willerslev
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2015-01-19       Impact factor: 6.237

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.