| Literature DB >> 25786896 |
R Henrik Nilsson1, Leho Tedersoo, Martin Ryberg, Erik Kristiansson, Martin Hartmann, Martin Unterseher, Teresita M Porter, Johan Bengtsson-Palme, Donald M Walker, Filipe de Sousa, Hannes Andres Gamper, Ellen Larsson, Karl-Henrik Larsson, Urmas Kõljalg, Robert C Edgar, Kessy Abarenkov.
Abstract
The nuclear ribosomal internal transcribed spacer (ITS) region is the most commonly chosen genetic marker for the molecular identification of fungi in environmental sequencing and molecular ecology studies. Several analytical issues complicate such efforts, one of which is the formation of chimeric-artificially joined-DNA sequences during PCR amplification or sequence assembly. Several software tools are currently available for chimera detection, but rely to various degrees on the presence of a chimera-free reference dataset for optimal performance. However, no such dataset is available for use with the fungal ITS region. This study introduces a comprehensive, automatically updated reference dataset for fungal ITS sequences based on the UNITE database for the molecular identification of fungi. This dataset supports chimera detection throughout the fungal kingdom and for full-length ITS sequences as well as partial (ITS1 or ITS2 only) datasets. The performance of the dataset on a large set of artificial chimeras was above 99.5%, and we subsequently used the dataset to remove nearly 1,000 compromised fungal ITS sequences from public circulation. The dataset is available at http://unite.ut.ee/repository.php and is subject to web-based third-party curation.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25786896 PMCID: PMC4462924 DOI: 10.1264/jsme2.ME14121
Source DB: PubMed Journal: Microbes Environ ISSN: 1342-6311 Impact factor: 2.912
Fig. 1A genus-level alignment in UNITE of the ectomycorrhizal genus Hydnum, with the individual species hypotheses (SHs) indicated by the colored boxes at different similarity levels (97–100%). One sequence (shown here in green) from each such species hypothesis was used to build the chimera reference dataset. Manually chosen reference sequences are indicated by filled circles in the SH column; these superseded the automatic choice of representative sequences for species hypotheses and are particularly suited for sequences from type (or otherwise authenticated) material. Two sequences from type specimens are indicated in the figure.
Fig. 2Boxplots of UCHIME scores for non-chimeric and chimeric sequences. All sequences were included in panel (a), while sequences with low read quality were removed in panel (b). The score difference between the non-chimeric and chimeric sequences was statistically assessed through the Wilcoxon-Mann-Whitney test.