| Literature DB >> 24936364 |
Jiro Nakayama1, Jiahui Jiang1, Koichi Watanabe2, Kangting Chen3, Huang Ninxin3, Kazunori Matsuda2, Takashi Kurakawa2, Hirokazu Tsuji2, Kenji Sonomoto1, Yuan-Kun Lee3.
Abstract
Pyrosequencing-based 16S rRNA profiling has become a common powerful tool to obtain the community structure of gastrointestinal tract microbiota, but it is still hard to process the massive amount of sequence data into microbial composition data, especially at the species level. Here we propose a new approach in combining the quantitative insights into microbial ecology (QIIME), Mothur and ribosomal database project (RDP) programs to efficiently process 454 pyrosequence data to bacterial composition data up to the species level. It was demonstrated to precisely convert batch sequence data of 16S rRNA V6-V8 amplicons obtained from adult Singaporean fecal samples to taxonomically annotated biota data.Entities:
Keywords: 16S rRNA gene; human gut microbiota; pyrosequence
Year: 2013 PMID: 24936364 PMCID: PMC4034321 DOI: 10.12938/bmfh.32.69
Source DB: PubMed Journal: Biosci Microbiota Food Health ISSN: 2186-3342
Fig. 1.Flowchart of computational processing of 454 pyrosequence data to individual microbiota data.
Fig. 2.Result of a chimera check of 15,578 nonredundant sequences. The scores of each sequence in de novo Uchime and DB Uchime were two-dimensionally plotted. Grey dots represent sequences with read counts of less than 10. Black dots represent sequences with the read counts of more than 9, and sequences with more than 99 counts are surrounded by a square. Vertical and horizontal dotted lines show the cut off threshold (score = 0.6) used for the chimera detection except in the case of SG.9_C1697.
The number and percentage of sequences and reads identified to known taxa*
| Nonchimeraa | Phylumb | Classb | Orderb | Familyb | Genusb | Speciesc | |
| Identified taxa # | - | 9 | 17 | 25 | 47 | 107 | 276 |
| Sequence # | 8,966 | 8,782 (97.9) | 8,689 (96.9) | 8,677 (96.8) | 8,309 (92.7) | 5,869 (65.5) | 3,992 (44.5) |
| Read # | 122,097 | 120,305 (98.5) | 119,666 (98.0) | 119,651 (98.0) | 118,267 (96.9) | 95,822 (78.5) | 95,112 (77.9) |
*The values in parentheses represent percentages of the number of nonchimera OTUs or reads. a Nonchimeras were selected by de novo Uchime (cut-off score = 0.6) and DB Uchime (cut-off score = 0.6) searches except in the cases of SG.6_C1697, which was recognized as a nonchimera by BLAST and RDP Seqmatch searches. b Identification to known taxonomic groups at these hierarcy levels was performed by RDP Classifier with a confidence threshold of 80%. c Identification to known species was performed by RDP Seqmatch with an S_ab score higher than 0.9.
Species commonly detected in more than a half of the 28 Singaporean subjects
| Ranka | Species | Abundanceb(%) | Carriersc | Metagenome rankd |
| 1 | 9.27 | 28 | 1 | |
| 2 | 5.17 | 23 | 4 | |
| 3 | 4.90 | 22 | 20 | |
| 4 | 4.90 | 25 | 31 | |
| 5 | 4.01 | 28 | NL | |
| 6 | 3.66 | 27 | NL | |
| 7 | 2.72 | 22 | 61 | |
| 8 | 2.32 | 27 | NL | |
| 9 | 2.04 | 23 | 11 | |
| 10 | 1.95 | 22 | 27 | |
| 11 | 1.72 | 20 | NL | |
| 12 | 1.70 | 20 | NL | |
| 13 | 1.67 | 22 | NL | |
| 14 | 1.57 | 27 | NL | |
| 15 | 1.56 | 28 | 6 | |
| 16 | 1.35 | 14 | 33 | |
| 17 | 1.27 | 21 | 13 | |
| 18 | 1.09 | 27 | 17 | |
| 19 | 0.91 | 26 | 30 | |
| 20 | 0.84 | 21 | 64 | |
| 21 | 0.76 | 19 | 55 | |
| 22 | 0.73 | 26 | NL | |
| 23 | 0.70 | 25 | 23 | |
| 24 | 0.67 | 14 | NL | |
| 25 | 0.64 | 23 | NL | |
| 26 | 0.62 | 23 | NL | |
| 27 | 0.61 | 27 | 21 | |
| 28 | 0.58 | 27 | NL | |
| 29 | 0.56 | 20 | NL | |
| 30 | 0.55 | 22 | NL | |
| 31 | 0.44 | 16 | 40 | |
| 32 | 0.37 | 23 | 3 | |
| 33 | 0.36 | 17 | 69 | |
| 34 | 0.31 | 17 | 28 | |
| 35 | 0.30 | 22 | NL | |
| 36 | 0.29 | 15 | 19 | |
| 37 | 0.28 | 17 | 32 | |
| 38 | 0.25 | 28 | 26 | |
| 39 | 0.21 | 17 | 53 | |
| 40 | 0.19 | 16 | NL | |
| 41 | 0.18 | 15 | NL | |
| 42 | 0.17 | 25 | NL | |
| 43 | 0.17 | 17 | NL | |
| 44 | 0.14 | 20 | 35 | |
| 45 | 0.14 | 23 | NL | |
| 46 | 0.11 | 20 | NL | |
| 47 | 0.10 | 24 | NL | |
| 48 | 0.09 | 14 | NL | |
| 49 | 0.08 | 17 | NL | |
| 50 | 0.05 | 14 | NL | |
| 51 | 0.05 | 15 | 46 | |
| 52 | 0.05 | 17 | NL | |
| 53 | 0.03 | 17 | NL |
a Ranked by the relative abundance in the third column. b Average of relative abundance among the 28 Singaporean subjects. c The number of subjects in whom the species was detected. d Rank in the metagenomic catalogue of the study of 124 European individuals (13). NL means not listed in the catalogue.
Fig. 3.Population distribution of 37 (A), 19 (B) and 4 (C) common genera, families and phyla, respectively, among the 28 Singaporean subjects. The relative abundance of each taxonomic group was calculated by dividing the read counts of identified sequences by the individual`s total read number. The 37 genera, 19 families, and 4 phyla were selected as they were detected in more than a half of our Singaporean subjects.
Fig. 4.Comparison of the relative abundances of Bacteroides (A) and Bifidobacterium (B) determined by 16S rRNA amplicon pyrosequencing with those determined by quantitative real-time PCR. The relative abundance in the pyrosequencing data was calculated by dividing the number of reads identified to genus Bacteroides or Bifidobacterium by the total read counts in each subject. In the quantitative PCR, group-specific primers targeting the Bacteroides fragilis group and genus Bifidobacterium were used, respectively [5].
Summary of the computational processing of 454 pyrosequence data
| Step | Process | Program | Program source (URL or e-mail) | Calculator machine | # of query | Output | Time |
| 1 | Barcode sorting | split_libraries.py | QIIME (http://qiime.org/scripts/split_libraries.html) | Windows 64 bit PC* | 1,583,218** | 106 to 8618 reads per subject | 10 min |
| 2 | OTU clustering | pick_otus_through_otu_table.py | QIIME (http://qiime.org/scripts/pick_otus_through_otu_table.html) | Windows 64 bit PC | 131,768 | 15,578 redundant seqs. | 17 min |
| 3 | de novo chimera check | de novo Uchime | Mothur 1.25.1 (http://www.mothur.org/wiki/Download_mothur) | Windows 64 bit PC | 15,578 | 7,200 chimeric seqs. | 6 min |
| 4 | DB chimera check | DB Uchime | Mothur 1.25.1 (http://www.mothur.org/wiki/Download_mothur) | Windows 64 bit PC | 15,578 | 7,852 chimeric seqs. (6,612 chimeric seqs.)*** | 7 hrs |
| 5 | Sequence search up to genus level | RDP Classifier | RDP II (http://rdp.cme.msu.edu/) | RDP host computer | 8,966 | 9 phyla–109 genera | 10 min |
| 6 | Sequence search in species level | RDP Seqmatch | RDP II (http://rdp.cme.msu.edu/) | RDP host computer | 8,966 | 3,992 seqs identified with know species | 3 hrs |
| 7 | Data processing | SeqmatchQ400 | Kyushu Univ(nakayama @ agr.kyushu-u.ac.jp) | Windows 64 bit PC | 8,945 | 276 species | 30 min |
*Intel Core i7-3930 K CPU (3.20 GHz). **Batch sequence data included all sequences from 2 x half PicoTiterPlate regions, which contained 256 samples including non-Singaporean samples. ***The number of chimeric sequences determined by taking into account both de novo and DB chimera checks.