| Literature DB >> 29212440 |
Kristi Gdanetz1, Gian Maria Niccolò Benucci2, Natalie Vande Pol3, Gregory Bonito4.
Abstract
BACKGROUND: One of the most crucial steps in high-throughput sequence-based microbiome studies is the taxonomic assignment of sequences belonging to operational taxonomic units (OTUs). Without taxonomic classification, functional and biological information of microbial communities cannot be inferred or interpreted. The internal transcribed spacer (ITS) region of the ribosomal DNA is the conventional marker region for fungal community studies. While bioinformatics pipelines that cluster reads into OTUs have received much attention in the literature, less attention has been given to the taxonomic classification of these sequences, upon which biological inference is dependent.Entities:
Keywords: ITS; RDP; SINTAX; UNOISE; UPARSE; fungal microbiome; mycobiome; taxonomy classifiers
Mesh:
Substances:
Year: 2017 PMID: 29212440 PMCID: PMC5719527 DOI: 10.1186/s12859-017-1952-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Types of classifications
| Present in the database? | Taxon name given? | Correct name given? | Result | Error Type |
|---|---|---|---|---|
| Yes | Yes | Yes | Good assignment | True positive |
| Yes | Yes | No | Misclassification | False positive |
| Yes | No | No | Underclassification | False negative |
| No | Yes | No | Overclassification | False negative |
| No | No | No | Good assignment | True negative |
Sample origins, barcode regions, and accession numbers for datasets
| Dataset | Gene Region | Read Type | Sample Origin | Data Availability | Reference |
|---|---|---|---|---|---|
| ITS1-Soil | ITS1 | 2 × 250 bp | North American soil | NCBI SRA SRP035367 | Smith & Peay [ |
| ITS2-Soil | ITS2 | 2 × 250 bp | North American soil | NCBI SRA SRR1508275 | Oliver et al. [ |
| ITS1-Plant | ITS1 | 2 × 250 bp | European plants | MG-RAST 13322 | Agler et al. [ |
| ITS2-Plant | ITS2 | 2 × 250 bp | European plants | MG-RAST 13322 | Agler et al. [ |
| ITS1-BCa | ITS1 | 1 × 300 bp | North American soil | NCBI SRA SRP079401 | Benucci et al., unpublished |
| ITS2-BCa | ITS2 | 1 × 300 bp | North American soil | NCBI SRA SRP079401 | Benucci et al., unpublished |
| ITS1-UNb | ITS1 | 1 × 300 bp | North American soil | NCBI SRA SRP079401 | Benucci et al., unpublished |
| ITS2-UNb | ITS2 | 1 × 300 bp | North American soil | NCBI SRA SRP079401 | Benucci et al., unpublished |
a data processed with UPARSE algorithm, OTUs generated with clustering
b data processed with UNOISE algorithm, ESVs generated with splitting
Fig. 1Overview of CONSTAX workflow. Bubbles highlighted by gray box are automated through constax.sh
Rules adopted to generate the combined taxonomy table
| RDP | UTAX | SINTAX | CONSENSUS |
|---|---|---|---|
| 3 taxonomy assignments | |||
|
|
|
|
|
|
|
|
|
|
|
|
|
| Use score |
| 2 taxonomy assignments +1 unidentified | |||
|
|
| Unidentified |
|
|
|
| Unidentified | Use score |
| 1 taxonomy assignment +2 unidentified | |||
|
| Unidentified | Unidentified | Taxon A |
Fig. 2Power of classifiers. Distribution of classified and unclassified OTUs for each classifier and across taxonomic level. a ITS1-soil dataset from Smith & Peay [36]. b ITS2-soil dataset from Oliver et al. [37]. c ITS1-plant and d ITS2-plant datasets from Angler et al. [10]
Range of percent improvement using CONSTAX
| Taxnomic Ranka | Percent Increaseb | ITS1-Soil | ITS2-Soil | ITS1-Plant | ITS2-Plant | ITS1-BCc | ITS2-BCc | ITS1-UNc,d | ITS2-UNc,d | Mean Increase |
|---|---|---|---|---|---|---|---|---|---|---|
| Kingdom | max. | 0.00 | 0.00 (1.60) | 0.00 (0.20) | 0.00 | 0.00 (1.00) | 0.00 (0.20) | 0.00 (2.20) | 0.00 | 0.81 (1.14) |
| min. | 0.40 | 3.00 | 1.40 | 0.20 | 3.60 | 1.40 | 2.60 | 0.40 | ||
| Phylum | max. | 0.47 | 1.29 | 4.46 (5.25) | 2.84 | 6.05 | 1.70 | 5.57 (7.62) | 2.76 | 6.83 (6.50) |
| min. | 5.21 (4.03) | 13.18 (5.43) | 17.06 | 18.01 | 11.46 | 12.24 | 11.73 | 8.90 | ||
| Class | max. | 1.58 | 1.83 | 0.93 | 4.89 | 3.04 | 0.84 | 3.98 | 3.70 | 8.56 (5.36) |
| min. | 9.23 (2.11) | 24.77 (7.65) | 21.98 (18.27) | 26.63 (22.28) | 18.26 (8.70) | 9.28 (5.49) | 13.94 (5.98) | 9.26 (5.19) | ||
| Order | max. | 1.69 | 2.47 | 0.33 | 2.58 | 2.48 | 1.40 | 5.75 | 3.23 | 10.94 (5.73) |
| min. | 11.83 (4.79) | 27.21 (6.71) | 37.42 (20.86) | 42.58 (24.52) | 19.31 (7.92) | 7.44 (5.58) | 19.91 (8.85) | 11.29 (4.03) | ||
| Family | max. | 1.36 | 3.54 | 2.47 | 3.16 | 1.86 | 1.10 | 6.88 | 6.13 | 15.72 (6.43) |
| min. | 13.9 (1.69) | 30.81 (6.57) | 58.02 (22.63) | 61.05 (26.32) | 20.50 (9.94) | 11.60 (6.08) | 32.80 (8.99) | 27.83 (7.08) | ||
| Genus | max. | 2.56 | 6.25 | 2.51 | 5.33 | 2.40 | 1.89 | 9.15 | 9.03 | 27.06 (9.04) |
| min. | 28.21 (3.85) | 53.13 (8.59) | 88.94 (37.69) | 85.33 (36.00) | 31.20 (7.20) | 35.22 (10.69) | 63.38 (10.56) | 62.58 (9.03) | ||
| Species | max. | 5.65 | 8.51 | 2.70 | 1.92 | 3.19 | 1.83 | 9.28 | 13.68 | 34.65 (13.20) |
| min. | 52.42 (16.13) | 65.96 (14.89) | 98.65 (47.97) | 96.15 (51.92) | 41.49 (9.57) | 51.38 (21.10) | 81.44 (11.34) | 89.47 (17.89) |
aPercent improvement calculated with RDP, SINTAX, and UTAX outputs (numbers in paranthesis calculated without including UTAX, only differing values displayed). Ranges represent minimum and maximum improvement when compared to all three classifiers at a given level
bEquation to calculate percent increase, where N = assigned OTUs.
cReads are forward (ITS1) or reverse (ITS2), not merged read pairs
dDataset was processed with denoising instead of clustering
Distribution of identically classified, uniquely classified, and unidentified OTUs across all taxonomic ranks for data presented in Fig. 2
| ITS1-Soil | Kingdom | Phylum | Class | Order | Family | Genus | Species |
| 3 classified, identical | 498 | 393 | 339 | 306 | 253 | 167 | 57 |
| 3 classified, 1 unique | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 classidied, 3 unique | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 classified, identical | 2 | 17 | 31 | 33 | 34 | 53 | 42 |
| 2 classified, unique | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 classified | 0 | 12 | 9 | 16 | 8 | 14 | 25 |
| RDP | 0 | 0 | 1 | 1 | 5 | 9 | 7 |
| SINTAX | 0 | 11 | 5 | 12 | 3 | 5 | 18 |
| UTAX | 0 | 1 | 3 | 3 | 0 | 0 | 0 |
| Unidentified | 0 | 78 | 121 | 145 | 205 | 266 | 376 |
| ITS2-Soil | |||||||
| 3 classified, identical | 481 | 332 | 242 | 203 | 135 | 60 | 16 |
| 3 classified, 1 unique | 3 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 classified, 3 unique | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 classified, identical | 2 | 33 | 58 | 57 | 45 | 49 | 20 |
| 2 classified, unique | 7 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 classified | 7 | 22 | 27 | 23 | 18 | 19 | 11 |
| RDP | 0 | 3 | 4 | 4 | 6 | 8 | 4 |
| SINTAX | 7 | 17 | 1 | 17 | 11 | 11 | 7 |
| UTAX | 0 | 2 | 22 | 2 | 1 | 0 | 0 |
| Unidentified | 0 | 113 | 173 | 217 | 302 | 372 | 453 |
| ITS1-Plant | |||||||
| 3 classified, identical | 490 | 304 | 234 | 181 | 98 | 22 | 2 |
| 3 classified, 1 unique | 2 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 classified, 3 unique | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 classified, identical | 4 | 52 | 45 | 65 | 88 | 97 | 71 |
| 2 classified, unique | 4 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 classified | 0 | 25 | 44 | 56 | 57 | 80 | 75 |
| RDP | 0 | 2 | 1 | 0 | 5 | 5 | 4 |
| SINTAX | 0 | 9 | 41 | 0 | 51 | 75 | 71 |
| UTAX | 0 | 14 | 2 | 1 | 1 | 0 | 0 |
| Unidentified | 0 | 119 | 177 | 198 | 257 | 301 | 352 |
| ITS2-Plant | |||||||
| 3 classified, identical | 499 | 166 | 120 | 83 | 36 | 11 | 2 |
| 3 classified, 1 unique | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 classified, 3 unique | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 classified, identical | 1 | 20 | 29 | 36 | 32 | 33 | 22 |
| 2 classified, unique | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 classified | 0 | 25 | 35 | 36 | 27 | 31 | 28 |
| RDP | 0 | 1 | 2 | 2 | 3 | 4 | 1 |
| SINTAX | 0 | 3 | 31 | 32 | 0 | 28 | 27 |
| UTAX | 0 | 21 | 2 | 2 | 24 | 0 | 0 |
| Unidentified | 0 | 289 | 316 | 345 | 405 | 425 | 448 |