| Literature DB >> 34899856 |
Lauren E Eldred1, R Greg Thorn1, David Roy Smith1.
Abstract
Simple nucleotide matching identification methods are not as accurate as once thought at identifying environmental fungal sequences. This is largely because of incorrect naming and the underrepresentation of various fungal groups in reference datasets. Here, we explore these issues by examining an environmental metabarcoding dataset of partial large subunit rRNA sequences of Basidiomycota and basal fungi. We employed the simple matching method using the QIIME 2 classifier and the RDP Classifier in conjunction with the latest releases of the SILVA (138.1, 2020) and RDP (11, 2014) reference datasets and then compared the results with a manual phylogenetic binning approach. Of the 71 query sequences tested, 21 and 42% were misidentified using QIIME 2 and the RDP Classifier, respectively. Of these simple matching misidentifications, more than half resulted from the underrepresentation of various groups of fungi in the SILVA and RDP reference datasets. More comprehensive reference datasets with fewer misidentified sequences will increase the accuracy of simple matching identifications. However, we argue that the phylogenetic binning approach is a better alternative to simple matching since, in addition to better accuracy, it provides evolutionary information about query sequences.Entities:
Keywords: Basidiomycota; SILVA; metabarcoding; misidentification; sequence identification
Year: 2021 PMID: 34899856 PMCID: PMC8662557 DOI: 10.3389/fgene.2021.768473
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1(A) Workflow used for the identifications of the 71 query sequences via simple matching and phylogenetic binning. (B) Step-by-step breakdown of the manual method for phylogenetic binning.
FIGURE 2Percentage of the fungal sequence misidentifications in the SILVA and RDP simple-matching methods by either improper labelling, underrepresentation of groups of fungi, or both.