Literature DB >> 24726347

The others: our biased perspective of eukaryotic genomes.

Javier del Campo1, Michael E Sieracki2, Robert Molestina3, Patrick Keeling4, Ramon Massana5, Iñaki Ruiz-Trillo6.   

Abstract

Understanding the origin and evolution of the eukaryotic cell and the full diversity of eukaryotes is relevant to many biological disciplines. However, our current understanding of eukaryotic genomes is extremely biased, leading to a skewed view of eukaryotic biology. We argue that a phylogeny-driven initiative to cover the full eukaryotic diversity is needed to overcome this bias. We encourage the community: (i) to sequence a representative of the neglected groups available at public culture collections, (ii) to increase our culturing efforts, and (iii) to embrace single cell genomics to access organisms refractory to propagation in culture. We hope that the community will welcome this proposal, explore the approaches suggested, and join efforts to sequence the full diversity of eukaryotes.
Copyright © 2014 Elsevier Ltd. All rights reserved.

Entities:  

Keywords:  culture collections; culturing bias; ecology; eukaryotic genomics; eukaryotic tree of life; phylogeny; single cell genomics

Mesh:

Year:  2014        PMID: 24726347      PMCID: PMC4342545          DOI: 10.1016/j.tree.2014.03.006

Source DB:  PubMed          Journal:  Trends Ecol Evol        ISSN: 0169-5347            Impact factor:   17.712


The need for a phylogeny-driven eukaryotic genome project

Eukaryotes are the most complex of the three domains of life. The origin of eukaryotic cells and their complexity remains one of the longest-debated questions in biology, famously referred to by Roger Stanier as the ‘greatest single evolutionary discontinuity’ in life [1]. Thus, understanding how this complex cell originated and how it evolved into the diversity of forms we see today is relevant to all biological disciplines including cell biology, evolutionary biology, ecology, genetics, and biomedical research. Progress in this area relies heavily on both genome data from extant organisms and on an understanding of their phylogenetic relationships. Genome sequencing is a powerful tool that helps us to understand the complexity of eukaryotes and their evolutionary history. However, there is a significant bias in eukaryotic genomics that impoverishes our understanding of the diversity of eukaryotes, and leads to skewed views of what eukaryotes even are, as well as their role in the environment. This bias is simple and widely recognized: most genomics focuses on multicellular eukaryotes and their parasites. The problem is not exclusive to eukaryotes. The launching of the so-called ‘Genomic Encyclopedia of Bacteria and Archaea’ [2] has begun to reverse a similar bias within prokaryotes, but there is currently no equivalent for eukaryotes. Targeted efforts have recently been initiated to increase the breadth of our genomic knowledge for several specific eukaryotic groups, but again these tend to focus on animals [3], plants [4], fungi [5], their parasites [6], or opisthokont relatives of animals and fungi [7]. Unfortunately, a phylogeny-driven initiative to sequence eukaryotic genomes specifically to cover the breadth of their diversity is lacking. The tools already exist to overcome these biases and fill in the eukaryotic tree, and we therefore hope that researchers will be inspired to explore these tools and embrace the prospect of working towards a community-driven initiative to sequence the full diversity of eukaryotes.

The multicellular effect

It is not surprising that the first and main bias in the study of eukaryotes arises from our anthropocentric view of life. More than 96% of the described eukaryotic species are either Metazoa (animals), Fungi, or Embryophyta (land plants) [8] (Figure 1A) – which we call the ‘big three’ of multicellular organisms (even though the Fungi also include unicellular members such as the yeasts). However, these lineages only represent 62% of the 18S rDNA (see Glossary) Genbank sequences (Figure 1B), which is of course a biased sample, or 23% of all operational taxonomic units (OTUs) in environmental surveys (Figure 1C). This bias is not new; research has historically focused on these three paradigmatic eukaryotic kingdoms, which are indeed important, but are also simply more conspicuous and familiar to us. In genomics this bias is amplified considerably: 85% of the completed or projected genome projects {as shown by the Genomes On-Line Database (GOLD) [9]} belong to the ‘big three’ (Figure 1D). Moreover, even within these groups there are biases. For example, many diverse invertebrate groups suffer from a lack of genomic data as keenly as do microbial groups. This makes for a pitiful future if we aim to understand and appreciate the complete eukaryotic tree of life. If we do not change this trend we risk neglecting the majority of eukaryotic diversity in future genomic or metagenomic-based ecological and evolutionary studies. This would provide us with a far from realistic picture.
Figure 1

Relative representation of metazoans, fungi, and land plants versus all the other eukaryotes in different databases. (A) Relative numbers of described species according to the CBOL ProWG (n = 2 001 573). (B) Relative numbers of 18S rDNA OTU97 in GenBank (n = 22 475). (C) Relative number of environmental 18S rDNA OTU97 in GenBank (n = 1165). (D) Relative number of species with a genome project completed or in progress according to GOLD, per eukaryotic group (n = 1758). Data in panels A–C are from [8]. Abbreviations: CBOL ProWG, Consortium for the Barcode of Life Protist Working Group; GOLD, Genomes OnLine Database; OTU97, operational taxonomic unit (>97% sequence identity).

The ‘multicellular bias’ is the most serious, but is not alone. The eukaryotic groups with most species deposited in culture collections and/or genome projects are also biased towards either those containing mainly phototrophic species or those that are parasitic and/or economically important (Figure 2). For example, both Archaeplastida and Stramenopila have more cultured species than other eukaryotes as a result of a long phycological tradition and the well-provided phycological culture collections [10], and also because they are easier to maintain in culture than heterotrophs. In both cases this translates to a comparatively large number of genome projects: several genomic studies target photosynthetic stramenopiles[11,12] and, owing to their economic relevance in the agriculture, the peronosporomycetes [13]. In addition, the apicomplexans within the Alveolata are also relatively well studied at the genomic level because they contain important human and animal parasites [14] such as Plasmodium and Toxoplasma. If we look instead at the number of sequenced strains rather than species, these biases are increased further (Figure 3). As a result, a significant proportion of the retrieved cultures and genomes correspond to different strains of the same dominant species. Therefore, we have a pool of species that have been redundantly cultured and sequenced.
Figure 2

Relative representation of eukaryotic supergroup diversity in different databases. (excluding metazoans, fungi, and land plants). (A) Percentage of described species per eukaryotic supergroup according to the CBOL ProWG. (B) Percentage of 18S rDNA OTU97 per eukaryotic supergroups in GenBank. (C) Percentage of environmental 18S rDNA OTU97 per eukaryotic supergroups. (D) Percentage of species with a cultured strain in any of the analyzed culture collections. Culture data are from five large protist culture collections (n = 3084) (the American Type Culture Collection, Culture Collection of Algae and Protozoa [24], the Roscoff Culture Collection [25], the National Center for Marine Algae and Microbiota [26] and the Culture Collection of Algae at Göttingen University [27]). (E) Relative numbers of species with a genome project completed or in progress according to GOLD, per eukaryotic group. Data from panels A–C are from [8]. Data from panels D and E are publicly available and the taxonomic analysis can be found in the supplementary data online. Abbreviations: CBOL ProWG, Consortium for the Barcode of Life Protist Working Group; Env 18S, environmental 18S rDNA sequences; GOLD, Genomes OnLine Database; OTU97, operational taxonomic unit (>97% sequence identity).

Figure 3

Eukaryotic a diversity distribution among the analyzed databases. (A) The 25 species with the most strains represented in the analyzed culture collections. (B) The 25 speciesa with the most ongoing genome projects. (C) The 25 most abundant SAGs OTU97 in the analyzed dataset. Abbreviations: MAST, marine stramenopile; OTU97, operational taxonomic unit (>97% sequence identity); SAG, single amplified genome.

aSome strains are not described at the species level and have been grouped by genus. Therefore they may represent more than a single species.

The missing branches of the eukaryotic tree of life

Although we lack an incontrovertible, detailed phylogenetic tree of the eukaryotes, a consensus tree is emerging thanks to molecular phylogenies [15]. The five monophyletic supergroups of eukaryotes are summarized in Box 1. The distribution of cultured and sequenced species over the tree provides a broad overview of our current knowledge of eukaryotic diversity (Figure 4). However, a quarter of the represented lineages lack even a single culture in any of the analyzed culture collections and, notably, 51% of them lack a genome. The most important gaps are within the Rhizaria, the Amoebozoa, and the Stramenopila, where many lineages are still underrepresented. However, many other lineages that lack any representative genome sequence are also found in the relatively well-described Opisthokonta and Excavata groups. This map is likely to be incomplete because several genome projects may not be reflected in the GOLD database, and because many cultures are not deposited in culture collections, but the overall trends probably afford an accurate representation of the biases we currently face.
Figure 4

The tree of eukaryotes, showing the distribution of current effort on culturing, genomics, and environmental single amplified genome (SAG) genomics for the main protistan lineages. Eukaryotic schematic tree representing major lineages. Colored branches represent the seven main eukaryotic supergroups, whereas grey branches are phylogenetically contentious taxa. The sizes of the dots indicate the proportion of species/OTU97 in each database. Culture data are from the analyzed publicly available protist culture collections (n = 3084). Genome data were extracted from the Genomes OnLine Database (GOLD) (n = 258) [9]. SAGs of OTU97 correspond to those retrieved during the Tara Oceans cruise (n = 158) (M.E.S., unpublished data). Taxonomic annotation of all datasets is based on [28]. The ‘big three’ (in bold) have been excluded from this analysis. Abbreviation: OTU97, operational taxonomic unit (>97% sequence identity).

Filling the gaps: how to

Although there may not be bad choices when selecting organisms for genome sequencing, there are certainly better choices if we aim to understand eukaryotic diversity. We argue that at least some of the effort should be specifically directed towards filling the gaps in the eukaryotic tree of life, focusing on those lineages that occupy key phylogenetic positions. How can that be done? One option is to sequence more cultured organisms. In fact, 95% of protist species in culture are not yet targeted for a genome project (Figure S1 in the supplementary data online). Thus, by obtaining the genome of some available cultured lineages that have not yet been sequenced, we could easily fill some of the important gaps of the tree, including some heterotrophic Stramenopila, Amoebozoa, and Rhizaria. However, selecting species that are available in culture is itself strongly biasing, and most lineages remain without any cultured representative [16]. Publicly accessible protist collections [such as the American Type Culture Collection (ATCC) and the Culture Collection of Algae and Protozoa (CCAP); summarized in Box 2] are considerably smaller than their bacterial or fungal counterparts. Among the reasons is the lack of a required, systematic deposit of newly described taxa, in contrast to the situation for bacteria [17]. Notably, and unfortunately, half of the species with genome projects completed or in progress are not deposited in any of the five analyzed publicly accessible culture collections. To avoid more ‘lost cultures’ in the future the community should establish and adopt standard procedures similar to those used in bacteriology to release cultures to protist collections. The whole community will benefit from this in the short and long term. In addition, there is an inherent technical bias in culturing, as well as a bias in culturing efforts. For example, phototrophic representatives of Stramenopila and Alveolata tend to have more cultures available than their heterotrophic counterparts (Figure 4). Indeed, 70.6% of the most common protist strains present in culture collections are phototrophic organisms (Figure 3). Therefore there is a need both to increase the culturing effort for a wider variety of environments and to develop novel and alternative culture techniques to retrieve refractory organisms [18], both of which take time, energy, and funding. Importantly, culture collections will need to be supported so that they can take on the challenge of maintaining more cultures and open their scope to include more difficult organisms that tend to be excluded from existing collections, in particular heterotrophs. A complementary option to increase the breadth of eukaryotic genomics is to use single cell genomics (SCG) [19]. Although the technology is still developing, this is probably the best way we have today to retrieve genomic information from abundant microbial eukaryotes that are ecologically relevant but are refractory to being cultured. For example, the single amplified genomes (SAGs) from different global oceanic sites obtained during the Tara Oceans cruise (M.E.S., unpublished data) fill reasonably well the culture and genomic gaps that some of the most abundant groups in the oceans suffer from (Figure 4). In particular, a significant fraction of the SAGs correspond to uncultured organisms such as the marine stramenopiles MAST-4 and MAST-7 [20], chrysophyte groups H and G [21], and the Syndiniales [22]. Importantly, sequence tagging shows that only 10% of the SAGs are present in any culture collection, and only 2.5% have an ongoing genome project (based on cultured taxa). It is worth mentioning that the SAGs so far available represent only marine microeukaryotes. Thus, although the analyzed SAGs certainly overcome part of the bias, they do not cover the full diversity of eukaryotes. Given the potential of SAGs to improve further our understanding of eukaryotic diversity, an important question to ask is whether high-quality genome data can be acquired from SAGs [19]. Currently, there seems to be a diversity of outcomes when using SAGs owing to the bias introduced by the whole-genome amplification procedure. The completeness range of the retrieved genome varies from less than 10% to a complete genome, and depends on the intrinsic properties of the cell studied as well as on the amplification method [23]. Culture certainly provides a more reliable way to obtain a genome of high quality at present, and a species in culture also provides researchers with a direct window to the biology of the organism and post-genomic research. Auto-ecological experiments, ultrastructure analyses, and even functional experiments can all be performed in culture, thereby providing a deeper context for the genome and the organism. However, in light of the lack of data we currently face, and the unlikelihood that a significant increase in resources for cultivation will soon appear, we argue strongly that genomic sequencing of SAGs is an important complement to culture-based research in furthering our understanding of eukaryotic diversity.

Make the tree thrive: a call to action

Genome sequences have cast invaluable light on the classification of organisms, notably in many cases where particular species were misclassified (Box 3). However, the available genome sequences of eukaryotes do not inform us only about the biology of the particular organism. They also make significant contributions to our understanding of eukaryotic biology in general, and to large-scale evolutionary and ecological processes. Nevertheless, for this potential to be completely fulfilled we must sample broadly, and there are currently important gaps in the diversity of eukaryotic genome sequences that undermine our efforts to capitalize on this potential. Understanding the whole of eukaryotic diversity will doubtless contribute to our understanding of specific biological questions, including some of our more pernicious problems in medicine, agriculture, evolution, and ecology. We propose that filling in the eukaryotic tree at the genomic level based on phylogenetic diversity should be a priority for the community. We also argue that this can be achieved by a combination of three complementary approaches. First, at least one genome from underrepresented lineages from which cultures are available should be sequenced. This is a straightforward problem, requiring phycologists, protistologists, culture collection curators, and genomic sequencing centers to coordinate efforts and expertise to choose the best target taxa and sequencing strategies. Second, efforts to culture diverse organisms should be supported, by sampling additional areas of the planet, developing novel techniques to include more recalcitrant species (especially heterotrophs), and by rewarding this difficult but essential task, especially in younger researchers before they conclude en masse that such crucial work is a professional dead-end. Such efforts are timeconsuming and have a built-in failure rate that makes them risky, and therefore policy changes will be helpful in order that funding agencies, universities, and research centers recognize the value of such work independently of the publication outcome. Finally, microbial ecologists and genomic centers should embrace the use of SCG and continue to improve the technology, which we believe will be the key to filling in missing parts of the tree in the short term. To coordinate all these efforts, funding agencies should also support the development of community resources such as publicly accessible culture collections and the maintenance of key taxa that are difficult to keep. We believe strongly that the time is ripe to reverse the genome sequencing bias in the tree of eukaryotes. We now have in our hands all the elements needed to change this skewed view and further our understanding of eukaryotic biology and evolution. All that needs to change is the will and a joint coordinated initiative. Thus, we hope that the eukaryotic community will welcome this proposal to build a representative and diverse ‘Genomic Encyclopedia of Eukaryotes’ and collaborate to make this happen.
  34 in total

1.  Culturing bias in marine heterotrophic flagellates analyzed through seawater enrichment incubations.

Authors:  Javier del Campo; Vanessa Balagué; Irene Forn; Itziar Lekunberri; Ramon Massana
Journal:  Microb Ecol       Date:  2013-06-11       Impact factor: 4.552

2.  Widespread occurrence and genetic diversity of marine parasitoids belonging to Syndiniales (Alveolata).

Authors:  L Guillou; M Viprey; A Chambouvet; R M Welsh; A R Kirkham; R Massana; D J Scanlan; A Z Worden
Journal:  Environ Microbiol       Date:  2008-09-02       Impact factor: 5.491

3.  Re-examining alveolate evolution using multiple protein molecular phylogenies.

Authors:  Naomi M Fast; Lingru Xue; Scott Bingham; Patrick J Keeling
Journal:  J Eukaryot Microbiol       Date:  2002 Jan-Feb       Impact factor: 3.346

4.  Taming the smallest predators of the oceans.

Authors:  Javier del Campo; Fabrice Not; Irene Forn; Michael E Sieracki; Ramon Massana
Journal:  ISME J       Date:  2012-07-19       Impact factor: 10.302

5.  The Phaeodactylum genome reveals the evolutionary history of diatom genomes.

Authors:  Chris Bowler; Andrew E Allen; Jonathan H Badger; Jane Grimwood; Kamel Jabbari; Alan Kuo; Uma Maheswari; Cindy Martens; Florian Maumus; Robert P Otillar; Edda Rayko; Asaf Salamov; Klaas Vandepoele; Bank Beszteri; Ansgar Gruber; Marc Heijde; Michael Katinka; Thomas Mock; Klaus Valentin; Fréderic Verret; John A Berges; Colin Brownlee; Jean-Paul Cadoret; Anthony Chiovitti; Chang Jae Choi; Sacha Coesel; Alessandra De Martino; J Chris Detter; Colleen Durkin; Angela Falciatore; Jérome Fournet; Miyoshi Haruta; Marie J J Huysman; Bethany D Jenkins; Katerina Jiroutova; Richard E Jorgensen; Yolaine Joubert; Aaron Kaplan; Nils Kröger; Peter G Kroth; Julie La Roche; Erica Lindquist; Markus Lommer; Véronique Martin-Jézéquel; Pascal J Lopez; Susan Lucas; Manuela Mangogna; Karen McGinnis; Linda K Medlin; Anton Montsant; Marie-Pierre Oudot-Le Secq; Carolyn Napoli; Miroslav Obornik; Micaela Schnitzler Parker; Jean-Louis Petit; Betina M Porcel; Nicole Poulsen; Matthew Robison; Leszek Rychlewski; Tatiana A Rynearson; Jeremy Schmutz; Harris Shapiro; Magali Siaut; Michele Stanley; Michael R Sussman; Alison R Taylor; Assaf Vardi; Peter von Dassow; Wim Vyverman; Anusuya Willis; Lucjan S Wyrwicz; Daniel S Rokhsar; Jean Weissenbach; E Virginia Armbrust; Beverley R Green; Yves Van de Peer; Igor V Grigoriev
Journal:  Nature       Date:  2008-10-15       Impact factor: 49.962

6.  The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata.

Authors:  Ioanna Pagani; Konstantinos Liolios; Jakob Jansson; I-Min A Chen; Tatyana Smirnova; Bahador Nosrat; Victor M Markowitz; Nikos C Kyrpides
Journal:  Nucleic Acids Res       Date:  2011-12-01       Impact factor: 16.971

7.  New insights into the diversity of marine picoeukaryotes.

Authors:  Fabrice Not; Javier del Campo; Vanessa Balagué; Colomban de Vargas; Ramon Massana
Journal:  PLoS One       Date:  2009-09-29       Impact factor: 3.240

Review 8.  From pathogen genomes to host plant processes: the power of plant parasitic oomycetes.

Authors:  Marina Pais; Joe Win; Kentaro Yoshida; Graham J Etherington; Liliana M Cano; Sylvain Raffaele; Mark J Banfield; Alex Jones; Sophien Kamoun; Diane G O Saunders
Journal:  Genome Biol       Date:  2013-06-28       Impact factor: 13.583

9.  Phylogenomics reshuffles the eukaryotic supergroups.

Authors:  Fabien Burki; Kamran Shalchian-Tabrizi; Marianne Minge; Asmund Skjaeveland; Sergey I Nikolaev; Kjetill S Jakobsen; Jan Pawlowski
Journal:  PLoS One       Date:  2007-08-29       Impact factor: 3.240

10.  Evolution and classification of myosins, a paneukaryotic whole-genome approach.

Authors:  Arnau Sebé-Pedrós; Xavier Grau-Bové; Thomas A Richards; Iñaki Ruiz-Trillo
Journal:  Genome Biol Evol       Date:  2014-02       Impact factor: 3.416

View more
  54 in total

1.  Sex is a ubiquitous, ancient, and inherent attribute of eukaryotic life.

Authors:  Dave Speijer; Julius Lukeš; Marek Eliáš
Journal:  Proc Natl Acad Sci U S A       Date:  2015-07-21       Impact factor: 11.205

2.  Increased sequencing depth does not increase captured diversity of arbuscular mycorrhizal fungi.

Authors:  Martti Vasar; Reidar Andreson; John Davison; Teele Jairus; Mari Moora; Maido Remm; J P W Young; Martin Zobel; Maarja Öpik
Journal:  Mycorrhiza       Date:  2017-07-20       Impact factor: 3.387

Review 3.  Probing the evolution, ecology and physiology of marine protists using transcriptomics.

Authors:  David A Caron; Harriet Alexander; Andrew E Allen; John M Archibald; E Virginia Armbrust; Charles Bachy; Callum J Bell; Arvind Bharti; Sonya T Dyhrman; Stephanie M Guida; Karla B Heidelberg; Jonathan Z Kaye; Julia Metzner; Sarah R Smith; Alexandra Z Worden
Journal:  Nat Rev Microbiol       Date:  2016-11-21       Impact factor: 60.633

4.  The relative ages of eukaryotes and akaryotes.

Authors:  David Penny; Lesley J Collins; Toni K Daly; Simon J Cox
Journal:  J Mol Evol       Date:  2014-09-02       Impact factor: 2.395

Review 5.  What do isogamous organisms teach us about sex and the two sexes?

Authors:  Jussi Lehtonen; Hanna Kokko; Geoff A Parker
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2016-10-19       Impact factor: 6.237

6.  Trypanosomatid mitochondrial RNA editing: dramatically complex transcript repertoires revealed with a dedicated mapping tool.

Authors:  Evgeny S Gerasimov; Anna A Gasparyan; Iosif Kaurov; Boris Tichý; Maria D Logacheva; Alexander A Kolesnikov; Julius Lukeš; Vyacheslav Yurchenko; Sara L Zimmer; Pavel Flegontov
Journal:  Nucleic Acids Res       Date:  2018-01-25       Impact factor: 16.971

Review 7.  Combining morphology, behaviour and genomics to understand the evolution and ecology of microbial eukaryotes.

Authors:  Patrick J Keeling
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2019-10-07       Impact factor: 6.237

8.  Using single-cell transcriptomics to understand functional states and interactions in microbial eukaryotes.

Authors:  Chuan Ku; Arnau Sebé-Pedrós
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2019-10-07       Impact factor: 6.237

9.  Provirophages in the Bigelowiella genome bear testimony to past encounters with giant viruses.

Authors:  Guillaume Blanc; Lucie Gallot-Lavallée; Florian Maumus
Journal:  Proc Natl Acad Sci U S A       Date:  2015-08-24       Impact factor: 11.205

10.  Ecological and evolutionary significance of novel protist lineages.

Authors:  Javier Del Campo; Laure Guillou; Elisabeth Hehenberger; Ramiro Logares; Purificación López-García; Ramon Massana
Journal:  Eur J Protistol       Date:  2016-02-20       Impact factor: 3.020

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.