| Literature DB >> 23028445 |
Stéphane Mahé1, Marie Duhamel, Thomas Le Calvez, Laetitia Guillot, Ludmila Sarbu, Anthony Bretaudeau, Olivier Collin, Alexis Dufresne, E Toby Kiers, Philippe Vandenkoornhuyse.
Abstract
BACKGROUND: In environmental sequencing studies, fungi can be identified based on nucleic acid sequences, using either highly variable sequences as species barcodes or conserved sequences containing a high-quality phylogenetic signal. For the latter, identification relies on phylogenetic analyses and the adoption of the phylogenetic species concept. Such analysis requires that the reference sequences are well identified and deposited in public-access databases. However, many entries in the public sequence databases are problematic in terms of quality and reliability and these data require screening to ensure correct phylogenetic interpretation. METHODS AND PRINCIPALEntities:
Mesh:
Substances:
Year: 2012 PMID: 23028445 PMCID: PMC3441585 DOI: 10.1371/journal.pone.0043117
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Flowchart of the data in the PHYMYCO-DB.
The arrows indicate the flow of gene sequences extracted from the GenBank database, through the automated and manual curation steps. All the sequences made available to users has passed the 2 curation processes. After each upgrade of the database (i.e. 4 times per year), expert manual curation is performed.
Figure 2Visualisation of sequences deleted by the manual curation after alignment (ClustalX 2.1).
The sequences highlighted in blue illustrate examples of sequences removed from PHYMYCO-DB. The compromised nature can stem from erroneous sequencing (e.g. repeated gaps), wrong annotation (e.g. sequence corresponding to another clade), high numbers of undetermined nucleotides, homopolymers insertions, erroneous alignment or reverse complementary sequences and presence of long insertions and introns or presence of deletions.
Figure 3SSU rRNA phylogenetic positions of deep-sea Chytridiomycota (colored terminals) along with the closest known related SSU rRNA fungal sequences.
Topology was built using MrBayes v.3.1.2 (Scale bar: 0.1 estimated substitutions per site, 3000000 generations sampled every 100 generations and an average standart deviation of split frequencies of 0.004140) from a ClustalW 2.1 alignment. The model GTR+I+G was designated by jModelTest 0.1. Node support values are given in the following order: Maximum Parsimony/Maximum Likelihood (both calculated with PAUP 4.0β10 version, 500 bootstraps)/MrBayes. Corallochytrium limacisporum (L42528), a putative choanoflagellate, was used as outgroup. Maunachytrium keaense (it is not part of PHYMYCO-DB) was also used to help build the tree. All sequences are listed with their GenBank accession numbers. The topologies were congruent apart from doted lines indicated in the figure. Thin lines show bootstrap values >50% and BPP >0.5 (MP/ML/MrBayes) and thick lines: bootstrap values >70% and BPP >0.7 (MP/ML/MrBayes). The sequences belonging to the Lobulomycetaceae family are indicated with their BLASTn percentage of maximum identity compared to the three deep-sea Chytridiomycota OTUs.