| Literature DB >> 23193267 |
Laure Guillou1, Dipankar Bachar, Stéphane Audic, David Bass, Cédric Berney, Lucie Bittner, Christophe Boutte, Gaétan Burgaud, Colomban de Vargas, Johan Decelle, Javier Del Campo, John R Dolan, Micah Dunthorn, Bente Edvardsen, Maria Holzmann, Wiebe H C F Kooistra, Enrique Lara, Noan Le Bescot, Ramiro Logares, Frédéric Mahé, Ramon Massana, Marina Montresor, Raphael Morard, Fabrice Not, Jan Pawlowski, Ian Probert, Anne-Laure Sauvadet, Raffaele Siano, Thorsten Stoeck, Daniel Vaulot, Pascal Zimmermann, Richard Christen.
Abstract
The interrogation of genetic markers in environmental meta-barcoding studies is currently seriously hindered by the lack of taxonomically curated reference data sets for the targeted genes. The Protist Ribosomal Reference database (PR(2), http://ssu-rrna.org/) provides a unique access to eukaryotic small sub-unit (SSU) ribosomal RNA and DNA sequences, with curated taxonomy. The database mainly consists of nuclear-encoded protistan sequences. However, metazoans, land plants, macrosporic fungi and eukaryotic organelles (mitochondrion, plastid and others) are also included because they are useful for the analysis of high-troughput sequencing data sets. Introns and putative chimeric sequences have been also carefully checked. Taxonomic assignation of sequences consists of eight unique taxonomic fields. In total, 136 866 sequences are nuclear encoded, 45 708 (36 501 mitochondrial and 9657 chloroplastic) are from organelles, the remaining being putative chimeric sequences. The website allows the users to download sequences from the entire and partial databases (including representative sequences after clustering at a given level of similarity). Different web tools also allow searches by sequence similarity. The presence of both rRNA and rDNA sequences, taking into account introns (crucial for eukaryotic sequences), a normalized eight terms ranked-taxonomy and updates of new GenBank releases were made possible by a long-term collaboration between experts in taxonomy and computer scientists.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23193267 PMCID: PMC3531120 DOI: 10.1093/nar/gks1160
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Number of nuclear-encoded sequences in PR2 as annotated at the Super-Group taxonomic level
| Super-group | n1 | n2 |
|---|---|---|
| Alveolata | 20 760 | 20 255 |
| Amoebozoa | 1902 | 1880 |
| Apusozoa | 254 | 242 |
| Archaeplastida | 16 309 | 16 092 |
| Eukaryota_Mikro | 3 | 3 |
| Eukaryota_X | 54 | 54 |
| Excavata | 2871 | 2869 |
| Hacrobia | 2192 | 2132 |
| Opisthokonta | 75 056 | 74 484 |
| Rhizaria | 7581 | 7459 |
| Stramenopiles | 9884 | 9640 |
| Total nuclear-encoded Eukaryota | 136 866 | 135 110 |
| Apicoplast | 26 | 26 |
| Chloroplast SSU | 9657 | 9657 |
| Hydrogenosome SSU | 6 | 6 |
| Mitochondrion SSU | 36 051 | 36 051 |
| Nucleomorph SSU (18S) | 264 | 262 |
n1, total number; n2, excluding putative chimera; Super-Group, rank 2 taxonomy.
Figure 1.Total number of SSU rDNA gene sequences in the PR2 database for each main eukaryotic lineage (all sequences = grey + black, complete or nearly complete sequences in light-grey). Note that nucleomorphs were extracted from Archaeplastida. Numbers indicated after bars indicate percentages of sequences that include the following: (i) the V4 region as defined by primers forward CCAGCASCYGCGGTAATTCC and reverse ACTTTCGTTCTTGATYRA used during the European Biomarks project; (ii) the V9 region as defined by primers forward GTACACACCGCCCGTC and reverse TGATCCTTCTGCAGGTTCACCTAC used during the European Biomarks project; and (iii) the V9 region defined by primers forward TTGTACACACCGCCC and reverse CCTTCYGCAGGTTCACCTAC used by the WAMPS project. For Opithokonta, number in white = total number of sequences.