Literature DB >> 21347361

How and why DNA barcodes underestimate the diversity of microbial eukaryotes.

Gwenael Piganeau1, Adam Eyre-Walker, Severine Jancek, Nigel Grimsley, Hervé Moreau.   

Abstract

BACKGROUND: Because many picoplanktonic eukaryotic species cannot currently be maintained in culture, direct sequencing of PCR-amplified 18S ribosomal gene DNA fragments from filtered sea-water has been successfully used to investigate the astounding diversity of these organisms. The recognition of many novel planktonic organisms is thus based solely on their 18S rDNA sequence. However, a species delimited by its 18S rDNA sequence might contain many cryptic species, which are highly differentiated in their protein coding sequences. PRINCIPAL
FINDINGS: Here, we investigate the issue of species identification from one gene to the whole genome sequence. Using 52 whole genome DNA sequences, we estimated the global genetic divergence in protein coding genes between organisms from different lineages and compared this to their ribosomal gene sequence divergences. We show that this relationship between proteome divergence and 18S divergence is lineage dependent. Unicellular lineages have especially low 18S divergences relative to their protein sequence divergences, suggesting that 18S ribosomal genes are too conservative to assess planktonic eukaryotic diversity. We provide an explanation for this lineage dependency, which suggests that most species with large effective population sizes will show far less divergence in 18S than protein coding sequences.
CONCLUSIONS: There is therefore a trade-off between using genes that are easy to amplify in all species, but which by their nature are highly conserved and underestimate the true number of species, and using genes that give a better description of the number of species, but which are more difficult to amplify. We have shown that this trade-off differs between unicellular and multicellular organisms as a likely consequence of differences in effective population sizes. We anticipate that biodiversity of microbial eukaryotic species is underestimated and that numerous "cryptic species" will become discernable with the future acquisition of genomic and metagenomic sequences.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21347361      PMCID: PMC3037371          DOI: 10.1371/journal.pone.0016342

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Our understanding of the evolution of eukaryotes was revolutionized when it became possible to compare sequenced marker genes, notably the ribosomal genes, among many organisms [1]. In practice, ribosomal genes are often the only markers available for estimating the diversity of unicellular eukaryotes, especially in the Chromalveolates, Excavata and Rhizaria group which have few sequenced representatives. They are also the only markers used in the analysis of environmental or metagenomic DNA sequence datasets [2], [3]. It is thus becoming crucially important to know how well these signatures represent the extent of diversity in the exploding body of data that will become available over the next ten years as revolutionary sequencing technology are used in panoceanic metagenomic campaigns [4], [5]. Marine metagenomics studies rely on a pragmatic species concept; sequences are declared as being from separate species or genera based upon an arbitrary level of sequence divergence at a marker locus, typically the 18S rDNA ribosomal gene [6]. In this study, we analysed how genome divergence, estimated from amino-acid changes in protein coding genes, compares with 18S ribosomal divergence, the universal marker for planktonic eukaryotes biodiversity.

Methods

Whole genome predicted proteins data was downloaded from GenBank, JGI, Genolevure, Ensembl [7], PLAZA [8] and organisms' dedicated databases (Table 1). Complete 18S rDNA sequences were downloaded from GenBank or extracted from the whole genome sequence by screening the complete genome with complete 18S rDNA sequence from a closely related species. For the primate data, 18S rDNA sequenced were reassembled from the GenBank Trace archive (Table 1).
Table 1

Genome data and 18S rDNA data used for analysis.

SpeciesDatabaseURLReleaseGene18S rDNA sequence
DIPTERA
Aedes aegypti VectorBase http://aaegypti.vectorbase.org/ AaegL1.116789from genome assembly
Culex pipiens VectorBase http://cpipiens.vectorbase.org/ CpipJ1.218883from genome assembly
Drosophila ananassae flybase ftp://ftp.flybase.net/genomes/ r1.315070from genome assembly
Drosophila melanogaster flybase ftp://ftp.flybase.net/genomes/ r5.921064M21017.1
Drosophila erecta flybase ftp://ftp.flybase.net/genomes/ r1.315048from genome assembly
Drosophila yakuba flybase ftp://ftp.flybase.net/genomes/ r1.316082from genome assembly
Drosophila grimshawi flybase ftp://ftp.flybase.net/genomes/ r1.314986from genome assembly
Drosophila willistoni flybase ftp://ftp.flybase.net/genomes/ r1.315513from genome assembly
Drosophila persimilis flybase ftp://ftp.flybase.net/genomes/ r1.316878from genome assembly
Drosophila pseudoobscura flybase ftp://ftp.flybase.net/genomes/ r2.316071AY03717
Drosophila sechellia flybase ftp://ftp.flybase.net/genomes/ r1.316471from genome assembly
Drosophila simulans flybase ftp://ftp.flybase.net/genomes/ r1.315415AY037174.1
VERTEBRATA
Homo sapiens Ensembl http://archive.ensembl.org/ v5447509M10098
Pan troglodytes Ensembl http://archive.ensembl.org/ v5434142rebuilt from Trace
Mus musculus Ensembl http://archive.ensembl.org/ v3831986X00686.1
Rattus norvegicus Ensembl http://archive.ensembl.org/ v5432948X01117
Macaca Mulatta Ensembl http://archive.ensembl.org/ v5436384rebuilt from Trace
Pongo pygmaeus Ensembl http://archive.ensembl.org/ v5423533rebuilt from Trace
Bos Taurus Ensembl http://archive.ensembl.org/ v5426977DQ222453.1
Equus caballus Ensembl http://archive.ensembl.org/ v5422641AJ311673.1
Gallus gallus Ensembl http://archive.ensembl.org/ v4722195AF173612
Xenopus tropicalis Ensembl http://archive.ensembl.org/ v5427710from genome assembly
STREPTOPHYTA
Oryza sativa Rice http://rice.plantbiology.msu.edu/ v667393from genome assembly
Sorghum bicolor JGI http://genome.jgi-psf.org/Sorbi1/Sorbi1.download.ftp.html Sbi1_434496from genome assembly
Populus trichocharpa JGI http://genome.jgi-psf.org/ v1.145555from genome assembly
Medicago truncatula Medicago http://www.medicago.org/ 44830AF093506.1
Arabidopsis thaliana TAIR http://www.arabidopsis.org/index.jsp 27855X16077.1
Arabidopsis lyrata JGI http://www.jgi.doe.gov/genome-projects/ 32670from genome assembly
Carica papaya Caricaasgpb.mhpcc.hawaii.edu/papaya/24782from genome assembly
Vitis vinifera Genoscope http://www.genoscope.cns.fr/ 30434from genome assembly
CHLOROPHYTA
Micromonas pusilla CCMP1545 JGI http://www.jgi.doe.gov/genome-projects/ V210242from genome assembly
Micromonas pusilla RCC299 JGI http://www.jgi.doe.gov/genome-projects/ V310109from genome assembly
Ostreococcus lucimarinus JGI http://www.jgi.doe.gov/genome-projects/ v27651from genome assembly
Ostreococcus RCC809 JGI http://www.jgi.doe.gov/genome-projects/ v17773from genome assembly
Bathycoccus prasinos Genoscope http://bioinformatics.psb.ugent.be/ V18747from genome assembly
Ostreococcus tauri Bogas http://bioinformatics.psb.ugent.be/ v27725from genome assembly
SACCHAROMYCETACEAE
Saccharomyces cerevisiae SGD http://www.yeastgenome.org/ 5914Z75578
Saccharomyces paradoxus MIT http://www.broad.mit.edu/annotation/ 4774X97806
Saccharomyces mikatae Broad http://fungal.genome.duke.edu/ 5884AB040998
Saccharomyces kudriavzeviWUSTL http://fungal.genome.duke.edu/ 6371AACI02000378.1
Saccharomyces bayanus MIT http://www.broad.mit.edu/annotation/ 4492X97777
Saccharomyces castellii WUSTL http://fungal.genome.duke.edu/ 5864AACF01000230.1
Lachancea waltii Genolevure http://fungal.genome.duke.edu/ 5350AADM01000401.1
Lachancea thermotolerans Genolevure http://fungal.genome.duke.edu/ 5092X89526.1
Twenty six phylogenetic independent comparisons were inferred from couple of species with less than 5% 18S rDNA divergences (all species pairs, number of genes and phylogenies within each lineage are available in Figure S1). All orthologous gene pairs between species were inferred by reciprocal best hit (e-value 10−3). We retrieved the common set of orthologous genes within each lineage by extracting the orthologous genes present in all pairwise species comparisons. We thus obtained 2151 common gene pairs in Chlorophyta, 5051 in Diptera, 2925 in Saccharomyceta, 4160 in Streptophyta and 5949 in Vertebrata. Protein sequences were aligned with the Needleman Wunsch algorithm [9] and processed with custom C codes to compute amino-acid identities over the concatenated alignments. Substitution rates dAA were estimated via maximum likelihood with the PAML package (Jones [10] substitution matrix) [11]. We manually inspected multiple sequence alignments to identify common sites of the 18S rDNA : large insertions occurring in some sequences were excluded from the alignment to get consistent divergence estimate across pairwise comparions. All 18S rDNA pairs were aligned with the Needleman Wunsch algorithm to estimate pairwise differences, The nucleotide substitution rates of the 18S rDNA were estimates with the PAML package (HKY85 substitution model). Statistical analyses were performed with the R software.

Results

The rate of 18S rDNA and protein evolution

Recent genome and metagenomic projects have highlighted the surprising discrepancy between 18S rDNA divergence and whole genome divergence in some phytoplanktonic species [12], [13], [14], [15], that are keystone players in the global carbon cycling [16]. Here we investigated the generality of this observation among both unicellular and muticellular eukaryotes. We compared the 18S rDNA and the proteome divergence across all available eukaryotic genomes in 2 unicellular (Baker's yeast and green alga) and 3 multicellular lineages (Vertebrates, Diptera and Land plants). We found that for a given level of rDNA divergence, unicellular eukaryotes had substantially greater proteome divergence than multicellular eukaryotes (Figure 1A). This can be more formally tested using an analysis of covariance of proteome versus rDNA divergence, forcing the regression lines through the origin and testing for equality of slopes : the test is highly significantly different (p<0.0001) (Figure 1A). Identical 18S rDNA sequences between two unicellular species may correspond to proteome divergences of the same order as those observed between Xenopus and Chicken or the Poplar tree and the grass Medicago (Figure 1B). Amino-acid divergences between orthologous genes are only one of the many hallmarks of evolutionary divergence after speciation. A genomic species definition for protists based on proteome divergence is stringent, because genomic rearrangements, the acquisition of new genes via duplication or even a few mutations within a subset of genes may be sufficient to delineate two species [17], [18]. To reduce possible effects of amino-acid content, base composition and non-independency of observations, we computed the substitution rates on a common set of orthologs within each lineage across all independent pairwise comparisons. Consistent with the raw number of difference estimates, the evolution rate of the 18S rDNA relative to the proteome is much lower in unicellular species (analysis of covariance unicellulars versus multicellulars p = 0.048) (Figure 2).
Figure 1

18S rDNA versus proteome divergence in unicellular and multicellular lineages.

A. Average proteome (amino-acid) and 18S rDNA differences (%) for 21 unicellular and 26 multicellular pairwise comparisons. The first class of 18S rDNA sequence differences limit, 0.5%, is the smallest threshold used to delineate Operational Taxonomic Units (OTU) in planktonic eukaryotes [26]. B. Selected examples of pairwise comparisons in each 18S rDNA divergence class: percent of amino-acid divergence (percent of 18S rDNA differences).

Figure 2

18s rDNA evolution rates versus Amino-acid evolution rates for all common orthologous genes within lineages for independent pairs of species.

Yellow: Vertebrates, Green: Streptophytes, Light blue: Diptera, Light green: Chlorophyta, Red: Saccharomyceta.

18S rDNA versus proteome divergence in unicellular and multicellular lineages.

A. Average proteome (amino-acid) and 18S rDNA differences (%) for 21 unicellular and 26 multicellular pairwise comparisons. The first class of 18S rDNA sequence differences limit, 0.5%, is the smallest threshold used to delineate Operational Taxonomic Units (OTU) in planktonic eukaryotes [26]. B. Selected examples of pairwise comparisons in each 18S rDNA divergence class: percent of amino-acid divergence (percent of 18S rDNA differences).

18s rDNA evolution rates versus Amino-acid evolution rates for all common orthologous genes within lineages for independent pairs of species.

Yellow: Vertebrates, Green: Streptophytes, Light blue: Diptera, Light green: Chlorophyta, Red: Saccharomyceta.

Discussion

A population genetic explanation

What could be the cause of this decoupling between 18S rDNA and proteome divergence in unicellar versus multicellular species? There are two general explanations; first, the proportion of mutations that are strongly deleterious is higher in 18S rDNA, when compared to protein sequences, in unicells compared to multicells. One could argue that the 18S rDNA may be under much more stronger selection in unicells, where fitness may depend more directly from transcription efficiency than in multicellular species. Second, the rate of adaptive evolution could be higher in protein sequences in unicells compared to multicells. It is difficult to differentiate between these possibilities. However, unicells and multicells are likely to differ in their effective population sizes and this suggests a simple explanation; that the proportion of effectively neutral mutations changes more in response to differences in the effective population size in the 18S rDNA than in the proteome. This can be formalised as follows. Let us assume that all mutations are deleterious (or effectively neutral) and that the distribution of fitness effects is a gamma distribution. Under a gamma distribution it can be shown that the rate of evolution, R, is a function of the mutation rate, μ, divergence time, t, and the Distribution of Fitness effects of new mutations, fully described by the shape parameters, ß, and the effective population size, Ne [19], [20], [21].We can thus express the relative ratio between the rate of evolution of the 18S rDNA, Rr, and the rate of evolution of the proteome, Rp, in one lineage as a function of three parameters, where N is the average effective population size within a lineage:This ratio can be estimated from our observations (Figure 2) by taking the linear regression coefficient for each lineage (slope = 0.017 for unicellulars and slope = 0.059 for multicellular organims). If we assume that unicells have an effective population size, Ne, that is 1000 to 1,000,000 times larger than in multicells, then ß−ß would be between −0.2 and −0.1 to explain the differences in the regression slopes. So quite modest differences in the distribution of fitness effects, and effective population sizes can lead to substantial differences in the relative rates at which the 18S rDNA and protein coding sequences evolve. Recent estimates of ß p for nuclear genes in Humans and Drosophila are 0.2 and 0.35 respectively [22] [23]and we thus expect ß r to take values smaller than 0.25. Large effective population sizes of unicellular eukaryotes may thus provide an explanation for the surprising low divergence of 18S rDNA relative to the genome divergence. More generally, this conclusion applies to any barcoding gene sufficiently constrained to provide a large phylogenetic spread over the eukaryotic tree of life, suggesting that biodiversity studies have to make a trade-off between phylogenetic spread and phylogenetic depth for a given barcoding gene. Given the present diversity estimates of eukaryotic unicells from conserved barcoding genes like the 18S rDNA [24], [25], we thus anticipate that future eukaryotic planktonic metagenomic and genomic analysis will lead to an increase in the number of species. Phylogenetic relationships and number of genes used for independent comparison. (TIFF) Click here for additional data file.
  21 in total

1.  Oceanic 18S rDNA sequences from picoplankton reveal unsuspected eukaryotic diversity.

Authors:  S Y Moon-van der Staay; R De Wachter; D Vaulot
Journal:  Nature       Date:  2001-02-01       Impact factor: 49.962

2.  Unexpected diversity of small eukaryotes in deep-sea Antarctic plankton.

Authors:  P López-García; F Rodríguez-Valera; C Pedrós-Alió; D Moreira
Journal:  Nature       Date:  2001-02-01       Impact factor: 49.962

3.  The rapid generation of mutation data matrices from protein sequences.

Authors:  D T Jones; W R Taylor; J M Thornton
Journal:  Comput Appl Biosci       Date:  1992-06

4.  Wide genetic diversity of picoplanktonic green algae (Chloroplastida) in the Mediterranean Sea uncovered by a phylum-biased PCR approach.

Authors:  Manon Viprey; Laure Guillou; Martial Ferréol; Daniel Vaulot
Journal:  Environ Microbiol       Date:  2008-04-21       Impact factor: 5.491

5.  A general method applicable to the search for similarities in the amino acid sequence of two proteins.

Authors:  S B Needleman; C D Wunsch
Journal:  J Mol Biol       Date:  1970-03       Impact factor: 5.469

6.  PAML: a program package for phylogenetic analysis by maximum likelihood.

Authors:  Z Yang
Journal:  Comput Appl Biosci       Date:  1997-10

7.  Joint inference of the distribution of fitness effects of deleterious mutations and population demography based on nucleotide polymorphism frequencies.

Authors:  Peter D Keightley; Adam Eyre-Walker
Journal:  Genetics       Date:  2007-12       Impact factor: 4.562

8.  The tiny eukaryote Ostreococcus provides genomic insights into the paradox of plankton speciation.

Authors:  Brian Palenik; Jane Grimwood; Andrea Aerts; Pierre Rouzé; Asaf Salamov; Nicholas Putnam; Chris Dupont; Richard Jorgensen; Evelyne Derelle; Stephane Rombauts; Kemin Zhou; Robert Otillar; Sabeeha S Merchant; Sheila Podell; Terry Gaasterland; Carolyn Napoli; Karla Gendler; Andrea Manuell; Vera Tai; Olivier Vallon; Gwenael Piganeau; Séverine Jancek; Marc Heijde; Kamel Jabbari; Chris Bowler; Martin Lohr; Steven Robbens; Gregory Werner; Inna Dubchak; Gregory J Pazour; Qinghu Ren; Ian Paulsen; Chuck Delwiche; Jeremy Schmutz; Daniel Rokhsar; Yves Van de Peer; Hervé Moreau; Igor V Grigoriev
Journal:  Proc Natl Acad Sci U S A       Date:  2007-04-25       Impact factor: 11.205

9.  The Sorcerer II Global Ocean Sampling Expedition: metagenomic characterization of viruses within aquatic microbial samples.

Authors:  Shannon J Williamson; Douglas B Rusch; Shibu Yooseph; Aaron L Halpern; Karla B Heidelberg; John I Glass; Cynthia Andrews-Pfannkoch; Douglas Fadrosh; Christopher S Miller; Granger Sutton; Marvin Frazier; J Craig Venter
Journal:  PLoS One       Date:  2008-01-23       Impact factor: 3.240

10.  Assessing the evolutionary impact of amino acid mutations in the human genome.

Authors:  Adam R Boyko; Scott H Williamson; Amit R Indap; Jeremiah D Degenhardt; Ryan D Hernandez; Kirk E Lohmueller; Mark D Adams; Steffen Schmidt; John J Sninsky; Shamil R Sunyaev; Thomas J White; Rasmus Nielsen; Andrew G Clark; Carlos D Bustamante
Journal:  PLoS Genet       Date:  2008-05-30       Impact factor: 5.917

View more
  21 in total

1.  Prasinoviruses of the marine green alga Ostreococcus tauri are mainly species specific.

Authors:  Camille Clerissi; Yves Desdevises; Nigel Grimsley
Journal:  J Virol       Date:  2012-02-08       Impact factor: 5.103

2.  Insights into global diatom distribution and diversity in the world's ocean.

Authors:  Shruti Malviya; Eleonora Scalco; Stéphane Audic; Flora Vincent; Alaguraj Veluchamy; Julie Poulain; Patrick Wincker; Daniele Iudicone; Colomban de Vargas; Lucie Bittner; Adriana Zingone; Chris Bowler
Journal:  Proc Natl Acad Sci U S A       Date:  2016-02-29       Impact factor: 11.205

3.  Single cell ecogenomics reveals mating types of individual cells and ssDNA viral infections in the smallest photosynthetic eukaryotes.

Authors:  L Felipe Benites; Nicole Poulton; Karine Labadie; Michael E Sieracki; Nigel Grimsley; Gwenael Piganeau
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2019-10-07       Impact factor: 6.237

4.  Genetic barcodes allow traceability of CRISPR/Cas9-derived Aspergillus niger strains without affecting their fitness.

Authors:  Sandra Garrigues; Roland S Kun; Ronald P de Vries
Journal:  Curr Genet       Date:  2021-03-16       Impact factor: 3.886

5.  Genetic diversity, morphological uniformity and polyketide production in dinoflagellates (Amphidinium, Dinoflagellata).

Authors:  Shauna A Murray; Tamsyn Garby; Mona Hoppenrath; Brett A Neilan
Journal:  PLoS One       Date:  2012-06-04       Impact factor: 3.240

6.  Conveniently pre-tagged and pre-packaged: extended molecular identification and metagenomics using complete metazoan mitochondrial genomes.

Authors:  Agnes Dettai; Cyril Gallut; Sophie Brouillet; Joel Pothier; Guillaume Lecointre; Régis Debruyne
Journal:  PLoS One       Date:  2012-12-14       Impact factor: 3.240

7.  Molecular approach to the identification of fish in the South China Sea.

Authors:  Junbin Zhang; Robert Hanner
Journal:  PLoS One       Date:  2012-02-17       Impact factor: 3.240

8.  A genomics approach reveals the global genetic polymorphism, structure, and functional diversity of ten accessions of the marine model diatom Phaeodactylum tricornutum.

Authors:  Achal Rastogi; Fabio Rocha Jimenez Vieira; Anne-Flore Deton-Cabanillas; Alaguraj Veluchamy; Catherine Cantrel; Gaohong Wang; Pieter Vanormelingen; Chris Bowler; Gwenael Piganeau; Hanhua Hu; Leila Tirichine
Journal:  ISME J       Date:  2019-10-17       Impact factor: 10.302

9.  Reference-free population genomics from next-generation transcriptome data and the vertebrate-invertebrate gap.

Authors:  Philippe Gayral; José Melo-Ferreira; Sylvain Glémin; Nicolas Bierne; Miguel Carneiro; Benoit Nabholz; Joao M Lourenco; Paulo C Alves; Marion Ballenghien; Nicolas Faivre; Khalid Belkhir; Vincent Cahais; Etienne Loire; Aurélien Bernard; Nicolas Galtier
Journal:  PLoS Genet       Date:  2013-04-11       Impact factor: 5.917

10.  Patterns of post-glacial genetic differentiation in marginal populations of a marine microalga.

Authors:  Pia Tahvanainen; Tilman J Alpermann; Rosa Isabel Figueroa; Uwe John; Päivi Hakanen; Satoshi Nagai; Jaanika Blomster; Anke Kremp
Journal:  PLoS One       Date:  2012-12-31       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.