Literature DB >> 29212440

CONSTAX: a tool for improved taxonomic resolution of environmental fungal ITS sequences.

Kristi Gdanetz1, Gian Maria Niccolò Benucci2, Natalie Vande Pol3, Gregory Bonito4.   

Abstract

BACKGROUND: One of the most crucial steps in high-throughput sequence-based microbiome studies is the taxonomic assignment of sequences belonging to operational taxonomic units (OTUs). Without taxonomic classification, functional and biological information of microbial communities cannot be inferred or interpreted. The internal transcribed spacer (ITS) region of the ribosomal DNA is the conventional marker region for fungal community studies. While bioinformatics pipelines that cluster reads into OTUs have received much attention in the literature, less attention has been given to the taxonomic classification of these sequences, upon which biological inference is dependent.
RESULTS: Here we compare how three common fungal OTU taxonomic assignment tools (RDP Classifier, UTAX, and SINTAX) handle ITS fungal sequence data. The classification power, defined as the proportion of assigned OTUs at a given taxonomic rank, varied among the classifiers. Classifiers were generally consistent (assignment of the same taxonomy to a given OTU) across datasets and ranks; a small number of OTUs were assigned unique classifications across programs. We developed CONSTAX (CONSensus TAXonomy), a Python tool that compares taxonomic classifications of the three programs and merges them into an improved consensus taxonomy. This tool also produces summary classification outputs that are useful for downstream analyses.
CONCLUSIONS: Our results demonstrate that independent taxonomy assignment tools classify unique members of the fungal community, and greater classification power is realized by generating consensus taxonomy of available classifiers with CONSTAX.

Entities:  

Keywords:  ITS; RDP; SINTAX; UNOISE; UPARSE; fungal microbiome; mycobiome; taxonomy classifiers

Mesh:

Substances:

Year:  2017        PMID: 29212440      PMCID: PMC5719527          DOI: 10.1186/s12859-017-1952-x

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


Background

Next-generation sequencing technologies and high-performance computers define the culture-independent era of microbial ecology. High-throughput sequencing of DNA barcode marker regions, namely the bacterial 16S rRNA gene or fungal internal transcribed spacer (ITS) ribosomal regions, have allowed researchers to characterize complex microbial communities at a depth not previously possible with culture-based methods. Hypervariable regions of the 16S rRNA gene have been extensively studied and adopted by researchers to describe prokaryotic microbial communities, and a mix of ribosomal markers have been used to describe fungal communities [1] over the past 25 years [2]. The ITS region, comprising the ITS1, 5.8S, and ITS2 segments, was recently selected as the formal DNA barcode for fungi [3-5], although there is a lack of consensus regarding which ITS (ITS1 or ITS2) to utilize as a barcode [6-8]. It remains unclear which of the ITS primer sets has the best resolution for fungal diversity, and papers targeting either ITS segment have been published at near equal frequencies [8-10]. Pipelines for processing fungal ITS amplicon datasets such as CLOTU [11], CloVR-ITS [12], PIPITS [1], and others [13] are available in the literature, but most of the tool-development effort has been towards generating nearly automated pipelines for filtering, trimming, and clustering of amplicon reads into operational taxonomic units (OTUs). Less emphasis has been placed on assigning taxonomy to representative OTU sequences in a dataset. Linnaean taxonomy provides a controlled vocabulary that communicates ecological, biological or geographic information. Linking OTUs to functionally meaningful names, which typically depends upon species-level resolution, is key to addressing biological and ecological hypotheses. Processing sequencing reads, in addition to taxonomy assignment of sequences, can be completed using various bioinformatics pipeline tools. The most popular are Mothur [14], QIIME [15], and USEARCH [16]. There are a variety of algorithms to use for the taxonomy assignment step, which include: BLAST [17], Ribosomal Database Project (RDP) Naïve Bayesian Classifier [18], UTAX [19], and SINTAX [20]. The RDP Classifier (RDPC) uses Bayesian statistics to find 8-mers that have higher probability of belonging to a given genus. Based on these conditions, RDPC estimates the probability that an unknown query DNA sequence belongs to the genus [18]. The UTAX algorithm looks for k-mer words in common between a query sequence and a known reference sequence, and calculates a score of word counts. The score is used to estimate confidence values for each of the taxonomic levels, which are then trained on the reference database to give an estimate of error rates [19]. The SINTAX algorithm predicts taxonomy by using k-mer similarity to identify the top hit in a reference database, and provides bootstrap confidence for all ranks in the prediction [20]. Local alignment, most commonly implemented in BLAST [17], is still occasionally used for taxonomy assignment of high-throughput sequence datasets. However use of BLAST to identify OTUs in amplicon-based microbiome datasets has low accuracy as demonstrated previously [20-22], and discussed by Wang et al. [18]. The UNITE reference database is a curated database of all International Nucleotide Sequence Database Collaboration (INSDC) fungal sequences, and is the most commonly used reference database for fungal amplicon analyses [23-25]. Recently the Ribosomal Database Project released the Warcup Fungal Database [26], a curated version of UNITE and INSDC. Apart from previously published database comparisons which showed the accuracy of UNITE [23] and Warcup fungal databases [26], all comparative studies of taxonomy classifiers of which we are aware, have analyzed only prokaryotic organisms [22, 27, 28]. Since only a small fraction of microbial species estimated to be on the planet have been described, taxonomic classification is not a trivial task and no algorithm is 100% precise. Several types of classification errors are possible, as highlighted in Table 1. The RDPC, UTAX, and SINTAX classifiers report a confidence value for the classification given to an OTU so that the user can set a cutoff value below which no name is given. Even though a number of databases and tools have been developed to enable high-throughput analyses of environmental sequences, researchers still need to solve the problems caused by misidentified or insufficiently identified sequences [5]. Further, some poorly sampled fungal lineages reduce the ability of a classifier to confidently assign OTUs to the correct fungal lineage regardless of the classification algorithm used.
Table 1

Types of classifications

Present in the database?Taxon name given?Correct name given?ResultError Type
YesYesYesGood assignmentTrue positive
YesYesNoMisclassificationFalse positive
YesNoNoUnderclassificationFalse negative
NoYesNoOverclassificationFalse negative
NoNoNoGood assignmentTrue negative
Types of classifications This study tested whether established taxonomic classifiers for fungal ITS DNA sequences generate similar profiles of the fungal community. Specifically, we compared the power (proportion of assigned OTUs at a given level) and consistency (agreement of OTU assignment across classifiers) of the RDPC, UTAX, and SINTAX classification algorithms. Power and consistency were compared across i) ITS1 or ITS2 regions, ii) OTU-clustering approaches, and iii) merged or single stranded reads. Further, we created a Python tool that functions independently of OTU-picking method to merge taxonomy assignments from multiple classifier programs into an improved consensus taxonomy, and generates several output files that can be used for subsequent community analysis.

Methods

Data accessibility

Sample origins, barcode regions, and accession numbers for all datasets used in the current study can be found in Table 2. Implementation of the tool presented in this paper requires users to download and install the following software: RDPC [https://github.com/rdpstaff/classifier], USEARCH version 8 for UTAX, and USEARCH version 9 or later for SINTAX [http://drive5.com/usearch/download.html], R v2.15.1 or later [https://www.r-project.org], Python version 2.7 [https://www.python.org]. Detailed installation and analysis instructions, including all custom scripts used in the analysis and a test dataset are available in Additional file 1, or for download from GitHub: []. All of the custom Python scripts described in the methods section can be downloaded from the CONSTAX.tar.gz file (Additional file 2). All the steps described in the methods section are automated through the constax.sh script, but are included as independent scripts in CONSTAX.tar.gz so they can be easily modified to suit the user’s needs. An overview of the data analysis workflow is available in Fig. 1.
Table 2

Sample origins, barcode regions, and accession numbers for datasets

DatasetGene RegionRead TypeSample OriginData AvailabilityReference
ITS1-SoilITS12 × 250 bpNorth American soilNCBI SRA SRP035367Smith & Peay [36]
ITS2-SoilITS22 × 250 bpNorth American soilNCBI SRA SRR1508275Oliver et al. [37]
ITS1-PlantITS12 × 250 bpEuropean plantsMG-RAST 13322Agler et al. [10]
ITS2-PlantITS22 × 250 bpEuropean plantsMG-RAST 13322Agler et al. [10]
ITS1-BCa ITS11 × 300 bpNorth American soilNCBI SRA SRP079401Benucci et al., unpublished
ITS2-BCa ITS21 × 300 bpNorth American soilNCBI SRA SRP079401Benucci et al., unpublished
ITS1-UNb ITS11 × 300 bpNorth American soilNCBI SRA SRP079401Benucci et al., unpublished
ITS2-UNb ITS21 × 300 bpNorth American soilNCBI SRA SRP079401Benucci et al., unpublished

a data processed with UPARSE algorithm, OTUs generated with clustering

b data processed with UNOISE algorithm, ESVs generated with splitting

Fig. 1

Overview of CONSTAX workflow. Bubbles highlighted by gray box are automated through constax.sh

Sample origins, barcode regions, and accession numbers for datasets a data processed with UPARSE algorithm, OTUs generated with clustering b data processed with UNOISE algorithm, ESVs generated with splitting Overview of CONSTAX workflow. Bubbles highlighted by gray box are automated through constax.sh

Generation of operational taxonomic units

For the ITS1-soil and ITS2-soil datasets (Table 2), forward and reverse reads were merged with PEAR version 0.9.8 [29]. Merged reads were randomly sampled to one million reads to reduce computational time. Reads were quality-filtered, trimmed, dereplicated, clustered at 97% similarity (the standard sequence similarity value), and OTU-calling was performed using USEARCH version 8.1.1831 [16]. Analysis of plant datasets (ITS1-plant and ITS2-plant) began with the processed 97% similarity OTUs provided by the authors [10]. For the ITS1/2-BC and ITS1/2-UN datasets, reads were quality-filtered as above, but differed in OTU-generation method. First, a clustering algorithm that generated OTUs using the UPARSE [19] algorithm was used to call OTUs for ITS1-BC and ITS2-BC. Second, the UNOISE2 algorithm [30] that performed denoising and generated exact sequence variants (ESVs) [31] was used for ITS1-UN and ITS2-UN. Each set of OTUs and ESVs were randomly sampled to 500 for the comparative taxonomic analysis described in the next section. Sample and abundance data were not used in this study. The code for the OTU-picking pipeline described above is available in Additional file 3.

Database formatting and classifier training

The UNITE fungal database [23], release 31–01-2016, containing 23,264 sequences was used in the current study. A custom script (FormatRefDB.py) was developed in Python 2.7 to format the database, starting from the general fasta release, for each classifier to ensure training was completed with identical databases. For RDPC training, custom Python scripts (subscript_lineage2taxonomyTrain.py, subscript_fasta_addFullLineage.py) were used to give each Species Hypothesis a unique name and remove special text characters. Prior to UTAX training and SINTAX classification, custom Python scripts were used to make minor changes to header lines of the fasta file. After formatting, these versions of the UNITE database were used to train classifiers. All the formatting and training scripts above are automated through the constax.sh script, users need only specify the location of the reference database.

Taxonomy assignment

Taxonomy was assigned to the OTUs with RDPC version 11.5 [18, 32], UTAX from USEARCH version 8.1.1831 [19, 33], and SINTAX from USEARCH version 9.2 [16]. This step generated three tables (one from each classifier) with a taxonomic assignment at each of the seven ranks of the hierarchy (Kingdom, Phylum, Class, Order, Family, Genus, Species). We used the default settings, a 0.8 cut-off, to serve as a baseline for comparison. Researchers may choose to use less stringent cut-offs, depending on the goals of their studies. The cut-off can be specified in the config file contained in CONSTAX.tar.gz (Additional file 2).

Post-taxonomy data processing

A custom Python script (CombineTaxonomy.py) was developed to standardize the taxonomy table formats, filter the output files at the recommended quality score, and create the consensus taxonomy. Additionally, the script produces a combined and improved (higher power) taxonomy table by concatenating the information contained in the taxonomy tables from RDPC, UTAX, and SINTAX. Rules developed to merge the taxonomy assignments implemented in the Python script are detailed in Table 3. Briefly, a majority rule (two out of three OTUs classified) was used when classifiers did not assign the same name to a representative sequence. When there was not a clear majority rule, the name with the highest quality score was chosen. The CombineTaxonomy.py script is also automated through the constax.sh script. All analyses downstream of the consensus OTU assignments were completed in R version 3.3.2 [34] and graphs were generated with the R package ‘ggplot2’ [35]. R code used to generate the graphs is also available in the CONSTAX.tar.gz, and automated through constax.sh script.
Table 3

Rules adopted to generate the combined taxonomy table

RDPUTAXSINTAXCONSENSUS
3 taxonomy assignments
Taxon A Taxon A Taxon A Taxon A
Taxon A Taxon A Taxon B Taxon A
Taxon A Taxon B Taxon C Use score
2 taxonomy assignments +1 unidentified
Taxon A Taxon A Unidentified Taxon A
Taxon A Taxon B UnidentifiedUse score
1 taxonomy assignment +2 unidentified
Taxon A UnidentifiedUnidentifiedTaxon A
Rules adopted to generate the combined taxonomy table

Results

Power of classifiers

Classification power differed across RDPC, UTAX, and SINTAX (Fig. 2). Also, the total number of assigned OTUs varied across datasets, ITS region, and OTU-generation approach. In general, the highest number of assignments at each level of the taxonomic hierarchy was observed in the ITS1-soil dataset shown in Fig. 2a [36]. Classifications for the ITS2-soil dataset [37] follow the same general pattern as ITS1-soil, but overall had lower power (Fig. 2b). Although, UTAX had higher classification power for some ITS1 datasets at Kingdom level (Fig. 2c, Additional file 4: Figure S1A), generally, SINTAX had the highest classification power (Fig. 2a-b, d). ITS1-plant (Fig. 2c) and ITS2-plant (Fig. 2d) [10] datasets generated a greater number of unidentified OTUs by all three of the classifiers when compared with the soil datasets (Fig. 2, Additional file 4: Figure S1). A larger number of identified OTUs were detected for the ITS1-BC and ITS2-BC datasets when OTUs were generated by denoising (Additional file 4: Figure S1A-B) instead of clustering (Additional file 4: Figure S1C-D), at all levels except Species. Moreover, a similar pattern was observed with the ITS1-BC and ITS2-BC datasets, more assigned OTUs were observed for ITS2-BC in comparison to ITS1-BC, but not at every rank level (Additional file 4: Figure S1).
Fig. 2

Power of classifiers. Distribution of classified and unclassified OTUs for each classifier and across taxonomic level. a ITS1-soil dataset from Smith & Peay [36]. b ITS2-soil dataset from Oliver et al. [37]. c ITS1-plant and d ITS2-plant datasets from Angler et al. [10]

Power of classifiers. Distribution of classified and unclassified OTUs for each classifier and across taxonomic level. a ITS1-soil dataset from Smith & Peay [36]. b ITS2-soil dataset from Oliver et al. [37]. c ITS1-plant and d ITS2-plant datasets from Angler et al. [10] Depending on the dataset, the number of unidentified OTUs gradually, or sharply, increased at other ranks higher than Kingdom level. Percent improvement of the consensus taxonomy assignments were calculated from maximum and minimum numbers of classifications obtained at any given rank (Table 4). With CONSTAX, there was ~1% mean improvement at Kingdom level when the consensus taxonomy was compared with an individual classifier program. At other rank levels, there was 7–35% mean improvement. For ITS2 datasets, there was a 1–61% percent improvement at Family level (Table 4). For ITS1 datasets there was a 1 to 59% improvement at Family level (Table 4). At Species level there was a 35% mean improvement across all datasets (Table 4). The higher end of these ranges is due to poor classification of OTUs, especially ITS2 OTUs, using UTAX. If the percent improvement is recalculated without UTAX the maximum percent improvement drops from 98% to 52% (Table 4).
Table 4

Range of percent improvement using CONSTAX

Taxnomic Ranka Percent Increaseb ITS1-SoilITS2-SoilITS1-PlantITS2-PlantITS1-BCc ITS2-BCc ITS1-UNc,d ITS2-UNc,d Mean Increase
Kingdommax.0.000.00 (1.60)0.00 (0.20)0.000.00 (1.00)0.00 (0.20)0.00 (2.20)0.000.81 (1.14)
min.0.403.001.400.203.601.402.600.40
Phylummax.0.471.294.46 (5.25)2.846.051.705.57 (7.62)2.766.83 (6.50)
min.5.21 (4.03)13.18 (5.43)17.0618.0111.4612.2411.738.90
Classmax.1.581.830.934.893.040.843.983.708.56 (5.36)
min.9.23 (2.11)24.77 (7.65)21.98 (18.27)26.63 (22.28)18.26 (8.70)9.28 (5.49)13.94 (5.98)9.26 (5.19)
Ordermax.1.692.470.332.582.481.405.753.2310.94 (5.73)
min.11.83 (4.79)27.21 (6.71)37.42 (20.86)42.58 (24.52)19.31 (7.92)7.44 (5.58)19.91 (8.85)11.29 (4.03)
Familymax.1.363.542.473.161.861.106.886.1315.72 (6.43)
min.13.9 (1.69)30.81 (6.57)58.02 (22.63)61.05 (26.32)20.50 (9.94)11.60 (6.08)32.80 (8.99)27.83 (7.08)
Genusmax.2.566.252.515.332.401.899.159.0327.06 (9.04)
min.28.21 (3.85)53.13 (8.59)88.94 (37.69)85.33 (36.00)31.20 (7.20)35.22 (10.69)63.38 (10.56)62.58 (9.03)
Speciesmax.5.658.512.701.923.191.839.2813.6834.65 (13.20)
min.52.42 (16.13)65.96 (14.89)98.65 (47.97)96.15 (51.92)41.49 (9.57)51.38 (21.10)81.44 (11.34)89.47 (17.89)

aPercent improvement calculated with RDP, SINTAX, and UTAX outputs (numbers in paranthesis calculated without including UTAX, only differing values displayed). Ranges represent minimum and maximum improvement when compared to all three classifiers at a given level

bEquation to calculate percent increase, where N = assigned OTUs.

cReads are forward (ITS1) or reverse (ITS2), not merged read pairs

dDataset was processed with denoising instead of clustering

Range of percent improvement using CONSTAX aPercent improvement calculated with RDP, SINTAX, and UTAX outputs (numbers in paranthesis calculated without including UTAX, only differing values displayed). Ranges represent minimum and maximum improvement when compared to all three classifiers at a given level bEquation to calculate percent increase, where N = assigned OTUs. cReads are forward (ITS1) or reverse (ITS2), not merged read pairs dDataset was processed with denoising instead of clustering

Consistency of classifiers

Generally, all the classifiers were consistent in OTU assignments. Based on the consensus taxonomy tables, no bias was observed toward a fungal lineage from any of the classifiers. Nearly all OTUs were identified at Kingdom level (Table 5, Additional file 5: Table S1). There were few examples across the datasets where a single OTU was placed into a unique lineage by one or more of the classifiers. Only 1.24% ∓ 0.006 (st. dev.) of OTUs were differentially assigned across the datasets. This differential assignment phenomenon was most frequently observed at Kingdom level where OTUs were placed with low confidence into either Kingdom Fungi or Protista (Table 5). These OTUs were rarely assigned at a higher level after Kingdom, and never higher than Class; they may be novel sequences, PCR, or sequencing errors. Across all datasets used in the present study (4000 OTUs/ESVs), there were two examples of OTUs assigned to unique fungal lineages. These were found only in ITS1-BC and ITS2-BC datasets (Table 5). The ITS1-BC OTU diverged at Class; the OTU was assigned to Eurotiomycetes and Sordariomycetes by RDPC and UTAX, respectively, and unidentified by SINTAX. This OTU did not have an assignment lower than family. The assignment of the ITS2-BC OTU diverged at Phylum; RDPC and SINTAX placed the ITS2-BC OTU into the Basidiomycota, and UTAX placed this OTU in the Ascomycota. The assignment diverged again at Class, where it was placed into the Pucciniomycetes by RDPC, and the Agaricomycetes by SINTAX.
Table 5

Distribution of identically classified, uniquely classified, and unidentified OTUs across all taxonomic ranks for data presented in Fig. 2

ITS1-SoilKingdomPhylumClassOrderFamilyGenusSpecies
 3 classified, identical49839333930625316757
 3 classified, 1 unique0000000
 3 classidied, 3 unique0000000
 2 classified, identical2173133345342
 2 classified, unique0000000
 1 classified01291681425
  RDP0011597
  SINTAX0115123518
  UTAX0133000
 Unidentified078121145205266376
ITS2-Soil
 3 classified, identical4813322422031356016
 3 classified, 1 unique3000000
 3 classified, 3 unique0000000
 2 classified, identical2335857454920
 2 classified, unique7000000
 1 classified7222723181911
  RDP0344684
  SINTAX71711711117
  UTAX02222100
 Unidentified0113173217302372453
ITS1-Plant
 3 classified, identical49030423418198222
 3 classified, 1 unique2000000
 3 classified, 3 unique0000000
 2 classified, identical4524565889771
 2 classified, unique4000000
 1 classified0254456578075
  RDP0210554
  SINTAX09410517571
  UTAX01421100
 Unidentified0119177198257301352
ITS2-Plant
 3 classified, identical4991661208336112
 3 classified, 1 unique0000000
 3 classified, 3 unique0000000
 2 classified, identical1202936323322
 2 classified, unique0000000
 1 classified0253536273128
  RDP0122341
  SINTAX03313202827
  UTAX021222400
 Unidentified0289316345405425448
Distribution of identically classified, uniquely classified, and unidentified OTUs across all taxonomic ranks for data presented in Fig. 2

Python tool outputs

CONSTAX is implemented in Python and provided as a Bourne Shell executable, constax.sh. After installation of the required dependencies, the user must modify paths and parameters in constax.sh and the config file, both of which can be found in CONSTAX.tar.gz (Additional file 2). The Python scripts called by constax.sh are provided independently and can be easily modified for use with other classifiers or reference databases. After implementation of constax.sh, filtered versions of all taxonomy tables for the given cutoff are generated, alongside the four main output files: i) consensus_taxonomy.txt, the final higher power taxonomy table; ii) combined_taxonomy.txt, which is a large table of all three taxonomy tables side-by-side in addition to the consensus taxonomy; iii) otu_taxonomy_CountClassified.txt, which details assigned and unidentified OTUs at each rank level; and iv) Classification_Summary.txt, which lists the total counts of all unique taxa at a given rank level.

Discussion

Factors that influence the composition and structure of microbial communities are mainly confined to three different groups: sample origin (e.g., soil or water), laboratory methods (e.g., primer selection, PCR conditions, library preparation), and post-sequencing bioinformatic analysis. Since there are sample or methodological challenges at several steps of microbial community studies that can ultimately influence taxonomic classification; we standardize and improve the taxonomic classification step of fungal microbiome studies with CONSTAX. CONSTAX improves taxonomy assignment of fungal OTUs regardless of the strategies researchers choose to reduce the sample or methodological challenges. Linking OTUs to functionally informative names, which largely requires genus- or species-level resolution, is key to addressing biological and ecological hypotheses in fungal community studies. Considerable time should be invested into choosing optimal tools for taxonomic analysis. In this study, eight fungal amplicon datasets were assigned taxonomy using the same reference database [23] and three taxonomy assignment programs were compared: RDPC [18, 32], UTAX [19, 33], and SINTAX [20]. The taxonomic classification step is arguably one of the most delicate steps of the pipeline for amplicon-based microbial ecology studies, because taxon names are largely the basis by which scientists attach biological interpretation to the data. Our results showed minor differences across taxonomic classification approaches using thresholds chosen a priori. The UTAX classifier generated greater numbers of unidentified OTUs compared with RDPC and SINTAX, a pattern that is pronounced in the ITS2 dataset. We also found more non-fungal OTUs were recovered from the ITS2 sequences; indicating primers for this region may be less fungi-specific than those used for amplifying the ITS1 region. The ITS1 region has been shown to be more conserved in sequence and length for most fungal lineages compared with ITS2 [38-40]. Whether the ITS1 or ITS2 region provides the best taxonomic resolution has been investigated previously with Sanger sequence data [3, 37] and pyrosequence data [9, 41]. Apart from the small bias of ITS1 against early diverging fungi, these regions yield similar profiles of fungal communities and either region is considered suitable for community studies. Regardless of primer choice, we showed that use of multiple taxonomy assignment algorithms resulted in consistent classifications when an appropriate OTU-clustering threshold level is used. Our tool, CONSTAX, implements the following best practice tips for taxonomy assignment of ITS datasets: i) Use more than one classifier program, as not one is clearly superior to others; ii) Obtain a consensus taxonomy after running multiple classifiers; iii) Use the most recent release of software. The classifier programs tested here differ slightly in power, so performing taxonomic classifications with multiple programs, and combining the results will result in a stronger assignment with higher resolution. When designing experiments, it behooves researchers to carefully consider their target organisms when choosing the ITS barcode region and selecting primers. When investigating broad patterns of fungi, use of ITS alone should be sufficient, but if there is interest in a specific group of fungi, additional markers for those lineages (such as 18S rRNA gene for arbuscular mycorrhizal fungi) may be needed [42]. Further, there are limitations in making functional inferences from fungal ITS amplicon data. If the research questions are aimed at specific species or functions, metagenomics may be a more appropriate approach than amplicon-based community analyses.

Conclusion

We provide a tool, CONSTAX, for generating consensus taxonomy of targeted amplicon sequence data, and demonstrate that it improves taxonomy assignments of environmental OTUs. Taxonomic assignment will improve as database completeness improves, especially the RDPC, since that algorithm functions best when there are multiple representatives for a group (genus or species). The mycological community should continue to generate high quality ITS reference sequences for their research organisms and from Herbarium specimens, which will further enhance the performance of taxonomy assignment algorithms. CONSTAX tutorial. Implementation of code and scripts for database formatting and trimming, taxonomy assignment, and post-taxonomy assignment filtering. (PDF 699 kb) CONSTAX.tar.gz compressed directory. Contains test datasets, Python, Shell, and R scripts to use the tool. (GZ 175 kb) otu_processing.sh pipeline. Contains code for sequence quality control and OTU-picking. (SH 2 kb) Power of taxonomy classifiers. Distribution of classified and unclassified OTUs for each classifier and across taxonomic level. (A) ITS1-UN and (B) ITS2-UN data analyzed using UNOISE. (C) ITS1-BC and (D) ITS2-BC data analyzed with UPARSE. (PDF 304 kb) Distribution of identically classified, uniquely classified, and unidentified OTUs across all taxonomic ranks for data presented in Additional file 4: Figure S1 (Benucci et al., unpublished). (XLSX 47 kb)
  33 in total

1.  Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi.

Authors:  Conrad L Schoch; Keith A Seifert; Sabine Huhndorf; Vincent Robert; John L Spouge; C André Levesque; Wen Chen
Journal:  Proc Natl Acad Sci U S A       Date:  2012-03-27       Impact factor: 11.205

2.  UNITE: a database providing web-based methods for the molecular identification of ectomycorrhizal fungi.

Authors:  Urmas Kõljalg; Karl-Henrik Larsson; Kessy Abarenkov; R Henrik Nilsson; Ian J Alexander; Ursula Eberhardt; Susanne Erland; Klaus Høiland; Rasmus Kjøller; Ellen Larsson; Taina Pennanen; Robin Sen; Andy F S Taylor; Leho Tedersoo; Trude Vrålstad; Björn M Ursing
Journal:  New Phytol       Date:  2005-06       Impact factor: 10.151

3.  UPARSE: highly accurate OTU sequences from microbial amplicon reads.

Authors:  Robert C Edgar
Journal:  Nat Methods       Date:  2013-08-18       Impact factor: 28.547

4.  A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis.

Authors:  Yijun Sun; Yunpeng Cai; Susan M Huse; Rob Knight; William G Farmerie; Xiaoyu Wang; Volker Mai
Journal:  Brief Bioinform       Date:  2011-04-27       Impact factor: 11.622

5.  ITS1 versus ITS2 as DNA metabarcodes for fungi.

Authors:  R Blaalid; S Kumar; R H Nilsson; K Abarenkov; P M Kirk; H Kauserud
Journal:  Mol Ecol Resour       Date:  2013-01-25       Impact factor: 7.090

6.  UCHIME improves sensitivity and speed of chimera detection.

Authors:  Robert C Edgar; Brian J Haas; Jose C Clemente; Christopher Quince; Rob Knight
Journal:  Bioinformatics       Date:  2011-06-23       Impact factor: 6.937

7.  The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis.

Authors:  J R Cole; B Chai; R J Farris; Q Wang; S A Kulam; D M McGarrell; G M Garrity; J M Tiedje
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

8.  Intraspecific ITS variability in the kingdom fungi as expressed in the international sequence databases and its implications for molecular species identification.

Authors:  R Henrik Nilsson; Erik Kristiansson; Martin Ryberg; Nils Hallenberg; Karl-Henrik Larsson
Journal:  Evol Bioinform Online       Date:  2008-05-26       Impact factor: 1.625

9.  PEAR: a fast and accurate Illumina Paired-End reAd mergeR.

Authors:  Jiajie Zhang; Kassian Kobert; Tomáš Flouri; Alexandros Stamatakis
Journal:  Bioinformatics       Date:  2013-10-18       Impact factor: 6.937

10.  PIPITS: an automated pipeline for analyses of fungal internal transcribed spacer sequences from the Illumina sequencing platform.

Authors:  Hyun S Gweon; Anna Oliver; Joanne Taylor; Tim Booth; Melanie Gibbs; Daniel S Read; Robert I Griffiths; Karsten Schonrogge
Journal:  Methods Ecol Evol       Date:  2015-05-25       Impact factor: 7.781

View more
  22 in total

1.  Fungal and Bacterial Diversity in the Tuber magnatum Ecosystem and Microbiome.

Authors:  Marozzi Giorgio; Benucci Gian Maria Niccolò; Turchetti Benedetta; Massaccesi Luisa; Baciarelli Falini Leonardo; Bonito Gregory; Buzzini Pietro; Agnelli Alberto; Donnini Domizia; Albertini Emidio
Journal:  Microb Ecol       Date:  2022-03-02       Impact factor: 4.552

2.  Contributions of environmental and maternal transmission to the assembly of leaf fungal endophyte communities.

Authors:  Lukas P Bell-Dereske; Sarah E Evans
Journal:  Proc Biol Sci       Date:  2021-08-11       Impact factor: 5.530

3.  Identification of Fungal Communities Within the Tar Spot Complex of Corn in Michigan via Next-Generation Sequencing.

Authors:  A G McCoy; M G Roth; R Shay; Z A Noel; M A Jayawardana; R W Longley; G Bonito; M I Chilvers
Journal:  Phytobiomes J       Date:  2019-08-26

4.  Fungal and bacterial community dynamics in substrates during the cultivation of morels (Morchella rufobrunnea) indoors.

Authors:  Reid Longley; Gian Maria Niccoló Benucci; Gary Mills; Gregory Bonito
Journal:  FEMS Microbiol Lett       Date:  2019-09-01       Impact factor: 2.742

5.  Build Your Own Mushroom Soil: Microbiota Succession and Nutritional Accumulation in Semi-Synthetic Substratum Drive the Fructification of a Soil-Saprotrophic Morel.

Authors:  Hao Tan; Yang Yu; Jie Tang; Tianhai Liu; Renyun Miao; Zhongqian Huang; Francis M Martin; Weihong Peng
Journal:  Front Microbiol       Date:  2021-05-24       Impact factor: 5.640

Review 6.  The Mycobiome: A Neglected Component in the Microbiota-Gut-Brain Axis.

Authors:  Raphaël Enaud; Louise-Eva Vandenborght; Noémie Coron; Thomas Bazin; Renaud Prevel; Thierry Schaeverbeke; Patrick Berger; Michael Fayon; Thierry Lamireau; Laurence Delhaes
Journal:  Microorganisms       Date:  2018-03-09

7.  HumanMycobiomeScan: a new bioinformatics tool for the characterization of the fungal fraction in metagenomic samples.

Authors:  Matteo Soverini; Silvia Turroni; Elena Biagi; Patrizia Brigidi; Marco Candela; Simone Rampelli
Journal:  BMC Genomics       Date:  2019-06-15       Impact factor: 3.969

8.  Generalist Taxa Shape Fungal Community Structure in Cropping Ecosystems.

Authors:  Jun-Tao Wang; Ju-Pei Shen; Li-Mei Zhang; Brajesh K Singh; Manuel Delgado-Baquerizo; Hang-Wei Hu; Li-Li Han; Wen-Xue Wei; Yun-Ting Fang; Ji-Zheng He
Journal:  Front Microbiol       Date:  2021-07-09       Impact factor: 5.640

Review 9.  Gut Microbiota beyond Bacteria-Mycobiome, Virome, Archaeome, and Eukaryotic Parasites in IBD.

Authors:  Mario Matijašić; Tomislav Meštrović; Hana Čipčić Paljetak; Mihaela Perić; Anja Barešić; Donatella Verbanac
Journal:  Int J Mol Sci       Date:  2020-04-11       Impact factor: 5.923

10.  Changes in the Fungal Microbiome of Maize During Hermetic Storage in the United States and Kenya.

Authors:  Brett Lane; Sandeep Sharma; Chenxing Niu; Angeline W Maina; John M Wagacha; Burton H Bluhm; Charles P Woloshuk
Journal:  Front Microbiol       Date:  2018-10-02       Impact factor: 5.640

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.