| Literature DB >> 31185897 |
Chung-Yen Chen1, Sen-Lin Tang2, Seng-Cho T Chou3.
Abstract
BACKGROUND: Metagenomics experiments often make inferences about microbial communities by sequencing 16S and 18S rRNA, and taxonomic assignment is a fundamental step in such studies. This paper addresses the weaknesses in two types of metrics commonly used by previous studies for measuring the performance of existing taxonomic assignment methods: Sequence count based metrics and Binary error measurement. These metrics made performance evaluation results biased, less informative and mutually incomparable.Entities:
Keywords: Classification; Data analysis; Metagenomics; Performance evaluation
Mesh:
Substances:
Year: 2019 PMID: 31185897 PMCID: PMC6561758 DOI: 10.1186/s12859-019-2896-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Cumulative sequence fractions for the common 16S and 18S imbalanced databases. (A balanced data set would assume a 45-degree line)
Fig. 2An example of a taxonomic tree. (For simplicity, only four ranks shown)
Fig. 5Evaluation results for the RDPNBC and Plateau methods using error rate and ATD
An example TD calculation. (T1..T6 are the 6 taxonomic labels shown in Fig. 2)
|
| T1 | T2 | T3 | T4 | T5 | T6 |
|---|---|---|---|---|---|---|
| T1 | 0 | 1/3 | 2/3 | 1/3 | 2/3 | 3/4 |
| T2 | 0 | 2/3 | 1/3 | 2/3 | 3/4 | |
| T3 | 0 | 2/3 | 1/3 | 1/4 | ||
| T4 | 0 | 1/2 | 3/4 | |||
| T5 | 0 | 2/4 | ||||
| T6 | 0 |
Fig. 3Example of ATD plot. This example plot shows that the method correctly classified around 1/2 of the taxa in the RDP database and around 1/3 of the taxa with 0.16 TD (1-rank error)
Summary for the rRNA gene databases used for this study
| Database | Version | Sequence Type | Sequences | Taxa | Singletons |
|---|---|---|---|---|---|
| RDP | V16 | 16S | 13,212 | 2472 | 1119 |
| Greengenes | Aug2013 | 16S | 203,452 | 5405 | 2078 |
| SILVA | V128 | 16S & 18S | 190,061 | 2078 | 1920 |
Settings for the chosen taxonomic assignment methods
| Method | Word length | Other parameters | Implemented by |
|---|---|---|---|
| KNN | 8 | Mothur v.1.39.5 | |
| 1NN | 8 | numwanted = 1 | Mothur v.1.39.5 |
| SINTAX | 8 | cutoff = 0 | USEARCH v9.2 |
| RDPNBC | 8 | cutoff = 0 | Mothur v.1.39.5 |
Fig. 4Evaluation results from testing RDPNBC on the RDP database
Fig. 6a Performances for four methods tested on three databases. b Standard deviation for each metric in (a)
Fig. 7a ATD plots on RDP. b ATD plots on Greengenes. c ATD plots on SILVA