Literature DB >> 25653722

Evaluation of the Performances of Ribosomal Database Project (RDP) Classifier for Taxonomic Assignment of 16S rRNA Metabarcoding Sequences Generated from Illumina-Solexa NGS.

Giovanni Bacci1, Alessia Bani2, Marco Bazzicalupo2, Maria Teresa Ceccherini3, Marco Galardini2, Paolo Nannipieri3, Giacomo Pietramellara3, Alessio Mengoni2.   

Abstract

Here we report a benchmark of the effect of bootstrap cut-off values of the RDP Classifier tool in terms of data retention along the different taxonomic ranks by using Illumina reads. Results provide guidelines for planning sequencing depths and selection of bootstrap cut-off in taxonomic assignments.

Entities:  

Keywords:  16S rRNA; OTU clustering; bacterial communities.; metabarcoding; ribosomal database project

Year:  2015        PMID: 25653722      PMCID: PMC4316179          DOI: 10.7150/jgen.9204

Source DB:  PubMed          Journal:  J Genomics


Introduction

The use of 16S rRNA massive sequencing has deeply improved the technical possibilities to describe the taxonomic composition and functionality of microbial communities 1. Following the reduction in DNA sequencing cost, many studies have been performed using amplicon libraries to taxonomically describe microbial communities in many different environments. The large number of sequence reads can be taxonomically assigned by comparison with taxonomically classified sequences present in dedicated databases of 16S rRNA genes, as for instance SILVA 2, Greengenes 3 or the Ribosomal Database Project 4. In particular, one of the most popular tools used to assign sequence reads to the prokaryotic taxonomy is the Naïve Bayesian Classifier tool hosted by Ribosomal Database Project (RDP Classifier) 4. The RDP Classifier tool uses a very fast algorithm, based on the Bayes' theorem, suitable for the analysis of large amount of sequence data. This algorithm has been tested on near-full-length 16S rRNA sequences and on randomly generated 16S rRNA sequence fragments of 400, 200, 100 and 50 bases in length from a number (5,014) of type strains belonging to 988 genera 4. An overall accuracy of above 88.7% and 83.2% for 400 and 200 base segments, respectively (very similar to the accuracy obtained with the near-full-length 16S rRNA sequence) was reported 4. Moreover, an average accuracy at genus level of 71.1% and 51.5% for the 100 and the 50 base segments respectively was found. However, these results have been obtained using a 16S rRNA sequence fragments dataset built with sequences derived from well taxonomically defined organisms. No data have been reported on datasets composed by 16SrRNA sequences from particular regions of the 16S molecule (e.g. V3 and V6) and obtained after amplification of DNA from environmental samples. In the last years, Illumina sequencing technology has emerged as one of the most popular sequencing technology, thanks to the lower prices, higher number of generated sequences and accuracy than pyrosequencing and Ion Torrent technologies 1, 5-8. In particular, different Illumina platforms are available (with different cost of sequencing) which provide different number of reads and different reads lengths (see for instance http://www.illumina.com/systems/sequencing.ilmn). Additionally, Illumina reads are usually 100-200 nt long (depending on the techniques used) and 16S rRNA amplicon studies have focused on single variable regions of the 16S rRNA gene, as the V3, V4 or V6, which are approximately 100-300 bp long. Consequently, a concern about the amount of reads to generate and the setting of the bootstrap threshold of RDP Classifier to provide biologically meaningful data is present. More specifically, there is a lack of information on the percentage of reads which can be assigned to the various phylogenetic levels with Illumina 16S rRNA metabarcoding. Here, we report a benchmark of RDP Classifier based on environmental sequence datasets obtained with Illumina sequencing technology. In particular, we investigated the effect of bootstrap cutoff values on the accuracy of taxonomic attribution of Illumina reads. Results obtained provide a guideline for the selection of optimal bootstrap cutoff values in terms of data retention along the different taxonomic ranks. Five datasets of 16S rRNA gene Illumina reads, generated from environmental DNA, were analyzed (Table 1). These datasets contains a high number of reads per sample (from 28634 to 759518 reads per sample) and are including reads obtained from V3, V4 and V6 regions. Reads present in the analyzed datasets were trimmed with StreamingTrim version 1.0 9, before taxonomic assignment with the RDP Classifier. The proportion of assigned reads in relation to the bootstrap cutoff value (from 0.1 to 1.0 with an increment of 0.1) for each taxonomic level (from domain to genus) is reported in Figure 1. As expected, the proportion of assigned reads decreased going down along taxonomic levels from phylum (from a mean of 100% to a mean of 25%, in the two datasets) to genus (from a mean of 60% to a mean of smaller than 5%, in the two datasets). In particular it is worth noticing that all datasets, which included three variable regions (V3, V4 and V6) of 16S rRNA gene, more than 25% of the reads could be assigned to the family level using a bootstrap cutoff value of 0.5 (the default cut-off value reported in the RDP Classifier tutorial). Moreover, even at higher cutoff values (> 0.8) an appreciable number of reads were still assigned (5%-10%). Interestingly, the V3 region performed better in the taxonomic attribution at Order and Family levels, indicating that even highly stringent bootstrap cut-off values (e.g. 0.7) may allow to assign more reads from V3 region than from V4 and V6 region, which consequently resulted less taxonomically informative.
Table 1

Description of the datasets used*.

BioProject*rRNA regionAverage reads lengthNumber of samplesAverage number ofsequences per sampleEnvironment
PRJEB6047V3302bp7261023Subgingival, supragingival, and tongue plaque from healthy and periodontal subjects
PRJNA245381V3300bp10028634Soil contaminated with increasing level of ionic Ag
PRJNA217938V4288bp25476230Samples from the surface to depth in Upper Mystic Lake, Winchester, MA
PRJNA238275V4251bp6759518Soil associated with the rhizosphere of the coffee plant (Coffea canephora) in Brazil
PRJNA188383V6200bp4866887Seawater and surface sediments retrieved from the Arctic Ocean

* The ID of the accession (http://www.ncbi.nlm.nih.gov/bioproject/), the variable region sequenced, the type of reads, the number of different samples analyzed and the number of reads is shown.

Figure 1

Effect of bootstrap cut-off thresholds on the number of reads. The percentage of trimmed reads assigned to each taxonomic level is reported versus RDP bootstrap cut-off values. Shaded lines correspond to the 95% confidence interval assuming normality.

The assignments trend at the genus level (the lower taxonomic level that can be obtained using the RDP Classifier) was then inspected (Figure 2). Here also, V3 region better performed than V4 and V6 region in the retention of taxonomic information lowering bootstrap values, especially at bootstrap cutoff value of 0.5 and lower. A pseudo-fit of curve was also produced (Supplementary Material: Figure S1), which may allow researchers to infer the percentage of sequences that could be assigned to the genus level at different RDP bootstrap cut-offs.
Figure 2

Percentage of assigned reads with respect to bootstrap cut-off thresholds at the genus level. Plots report the assigned reads for all dataset analyzed. Shaded lines correspond to the 95% confidence interval assuming normality.

In conclusion, Illumina reads, shorter than 200 nt, can be classified using one of the most common 16S rRNA sequence classifier: the RDP Classifier. As one would expect, the increase of the bootstrap cutoff value leads to a decreased number of assigned sequences. However, even at cutoffs higher than those indicated in the RDP Classifier tutorial, approximately 20-30% of the analyzed reads were still assigned. These results indicate that Illumina-based metabarcode sequencing of 16S rRNA gene can provide reliable information for taxonomic composition of a community at the genus level even using classification software not specifically designed for this type of sequences. The reported models for trend plots can guide experimentalists in choosing the sequencing depth more adapted for retaining an appreciable number of assigned reads different taxonomic resolution. Figure S1. Click here for additional data file.
  10 in total

1.  Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB.

Authors:  T Z DeSantis; P Hugenholtz; N Larsen; M Rojas; E L Brodie; K Keller; T Huber; D Dalevi; P Hu; G L Andersen
Journal:  Appl Environ Microbiol       Date:  2006-07       Impact factor: 4.792

2.  Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy.

Authors:  Qiong Wang; George M Garrity; James M Tiedje; James R Cole
Journal:  Appl Environ Microbiol       Date:  2007-06-22       Impact factor: 4.792

3.  StreamingTrim 1.0: a Java software for dynamic trimming of 16S rRNA sequence data from metagenetic studies.

Authors:  G Bacci; M Bazzicalupo; A Benedetti; A Mengoni
Journal:  Mol Ecol Resour       Date:  2013-11-16       Impact factor: 7.090

4.  Illumina-based analysis of microbial community diversity.

Authors:  Patrick H Degnan; Howard Ochman
Journal:  ISME J       Date:  2011-06-16       Impact factor: 10.302

5.  Generation of multimillion-sequence 16S rRNA gene libraries from complex microbial communities by assembling paired-end illumina reads.

Authors:  Andrea K Bartram; Michael D J Lynch; Jennifer C Stearns; Gabriel Moreno-Hagelsieb; Josh D Neufeld
Journal:  Appl Environ Microbiol       Date:  2011-04-01       Impact factor: 4.792

6.  Performance comparison of Illumina and ion torrent next-generation sequencing platforms for 16S rRNA-based bacterial community profiling.

Authors:  Stephen J Salipante; Toana Kawashima; Christopher Rosenthal; Daniel R Hoogestraat; Lisa A Cummings; Dhruba J Sengupta; Timothy T Harkins; Brad T Cookson; Noah G Hoffman
Journal:  Appl Environ Microbiol       Date:  2014-09-26       Impact factor: 4.792

7.  PANDAseq: paired-end assembler for illumina sequences.

Authors:  Andre P Masella; Andrea K Bartram; Jakub M Truszkowski; Daniel G Brown; Josh D Neufeld
Journal:  BMC Bioinformatics       Date:  2012-02-14       Impact factor: 3.169

8.  Comparison of two next-generation sequencing technologies for resolving highly complex microbiota composition using tandem variable 16S rRNA gene regions.

Authors:  Marcus J Claesson; Qiong Wang; Orla O'Sullivan; Rachel Greene-Diniz; James R Cole; R Paul Ross; Paul W O'Toole
Journal:  Nucleic Acids Res       Date:  2010-09-29       Impact factor: 16.971

9.  Microbiome profiling by illumina sequencing of combinatorial sequence-tagged PCR products.

Authors:  Gregory B Gloor; Ruben Hummelen; Jean M Macklaim; Russell J Dickson; Andrew D Fernandes; Roderick MacPhee; Gregor Reid
Journal:  PLoS One       Date:  2010-10-26       Impact factor: 3.240

10.  The SILVA ribosomal RNA gene database project: improved data processing and web-based tools.

Authors:  Christian Quast; Elmar Pruesse; Pelin Yilmaz; Jan Gerken; Timmy Schweer; Pablo Yarza; Jörg Peplies; Frank Oliver Glöckner
Journal:  Nucleic Acids Res       Date:  2012-11-28       Impact factor: 16.971

  10 in total
  20 in total

1.  Insights into the Interactions Between Root Phenotypic Traits and the Rhizosphere Bacterial Community.

Authors:  Weiai Zeng; Zhenhua Wang; Yansong Xiao; Kai Teng; Zhihui Cao; Hailin Cai; Yongjun Liu; Huaqun Yin; Peijian Cao; Jiemeng Tao
Journal:  Curr Microbiol       Date:  2022-04-30       Impact factor: 2.188

2.  Effects of Antibiotic Use on Saliva Antibody Content and Oral Microbiota in Sprague Dawley Rats.

Authors:  Xi Cheng; Fuming He; Misi Si; Ping Sun; Qianming Chen
Journal:  Front Cell Infect Microbiol       Date:  2022-01-31       Impact factor: 5.293

Review 3.  Intricacies of assessing the human microbiome in epidemiologic studies.

Authors:  Courtney K Robinson; Rebecca M Brotman; Jacques Ravel
Journal:  Ann Epidemiol       Date:  2016-04-12       Impact factor: 3.797

4.  Stable Core Gut Microbiota across the Freshwater-to-Saltwater Transition for Farmed Atlantic Salmon.

Authors:  Knut Rudi; Inga Leena Angell; Phillip B Pope; Jon Olav Vik; Simen Rød Sandve; Lars-Gustav Snipen
Journal:  Appl Environ Microbiol       Date:  2018-01-02       Impact factor: 4.792

5.  Bifidobacterium infantis Potentially Alleviates Shrimp Tropomyosin-Induced Allergy by Tolerogenic Dendritic Cell-Dependent Induction of Regulatory T Cells and Alterations in Gut Microbiota.

Authors:  Linglin Fu; Jinyu Song; Chong Wang; Shujie Fu; Yanbo Wang
Journal:  Front Immunol       Date:  2017-11-10       Impact factor: 7.561

6.  Strand-specific transcriptomes of Enterohemorrhagic Escherichia coli in response to interactions with ground beef microbiota: interactions between microorganisms in raw meat.

Authors:  Wessam Galia; Francoise Leriche; Stéphane Cruveiller; Cindy Garnier; Vincent Navratil; Audrey Dubost; Stéphanie Blanquet-Diot; Delphine Thevenot-Sergentet
Journal:  BMC Genomics       Date:  2017-08-03       Impact factor: 3.969

7.  Tackling critical parameters in metazoan meta-barcoding experiments: a preliminary study based on coxI DNA barcode.

Authors:  Bachir Balech; Anna Sandionigi; Caterina Manzari; Emiliano Trucchi; Apollonia Tullo; Flavio Licciulli; Giorgio Grillo; Elisabetta Sbisà; Stefano De Felici; Cecilia Saccone; Anna Maria D'Erchia; Donatella Cesaroni; Maurizio Casiraghi; Saverio Vicario
Journal:  PeerJ       Date:  2018-06-13       Impact factor: 2.984

8.  Cluster oligonucleotide signatures for rapid identification by sequencing.

Authors:  Manuel Zahariev; Wen Chen; Cobus M Visagie; C André Lévesque
Journal:  BMC Bioinformatics       Date:  2018-10-29       Impact factor: 3.169

9.  A First Insight into the Gut Microbiota of the Sea Turtle Caretta caretta.

Authors:  Khaled F A Abdelrhman; Giovanni Bacci; Cecilia Mancusi; Alessio Mengoni; Fabrizio Serena; Alberto Ugolini
Journal:  Front Microbiol       Date:  2016-07-07       Impact factor: 5.640

10.  Differences in the Gut Microbiota Establishment and Metabolome Characteristics Between Low- and Normal-Birth-Weight Piglets During Early-Life.

Authors:  Na Li; Shimeng Huang; Lili Jiang; Wei Wang; Tiantian Li; Bin Zuo; Zhen Li; Junjun Wang
Journal:  Front Microbiol       Date:  2018-09-07       Impact factor: 5.640

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.