Literature DB >> 20210981

Filling the gaps in the genomic landscape.

David Williams1, J Peter Gogarten, Pascal Lapierre.   

Abstract

A new initiative provides comparative genomicists with a more complete picture of genome diversity. Here we discuss the improved sampling strategy.

Entities:  

Mesh:

Year:  2010        PMID: 20210981      PMCID: PMC2872866          DOI: 10.1186/gb-2010-11-2-103

Source DB:  PubMed          Journal:  Genome Biol        ISSN: 1474-7596            Impact factor:   13.583


Research highlight

The relentless progress in sequencing technology continues to open up new opportunities for biologists. Since surveys of the first complete genetic code of single organisms only 15 years ago, findings from comparative genomics are now commonplace thanks to the more than 1,000 sequenced genomes available. Among the most striking discoveries is the high level of variation in gene content between closely related microbial strains. However, a representative sample of genomes from cultured bacteria and archaea has, until very recently, been out of reach. Many previous sequencing efforts have focused on useful, dangerous or unusual microorganisms, providing a patchy sampling of the known phylogenetic diversity. A new initiative, the 'Genomic Encyclopedia of Bacteria and Archaea' (GEBA) reported recently by Wu et al. [1], aims to fill in the gaps to provide a more complete picture of genomic diversity. The initial stage of the project aims to complete 159 genomes across the Bacteria and the Archaea selected according to their position in a phylogenetic tree of small subunit (SSU) rRNA, with in-depth sampling of the Actinobacteria. By analyzing 56 of the newly sequenced genomes the authors demonstrate improvements in the rate of novel protein discovery and extend the diversity and distribution of known protein families - a clear indication of the success of the new sampling strategy. On this basis we can expect further revelations in the near future. Here we discuss the advantages of the new sampling strategy and its limitations in the light of the apparent non-tree-like histories of whole genomes inferred from recent comparative genomic studies.

Genome content and diversity

During the past decade, our understanding of evolution at the genomic level has been shaken to its core by many reports showing that genomes from closely related species can vary greatly in terms of gene content. The rapid alteration of gene content in genomes was first demonstrated by Welch et al. [2], with the comparison of three strains of Escherichia coli. They found that only about 39% of the non-overlapping set of genes were present in all three strains, leaving the majority of the genes to have either been gained through gene transfers and internal duplication, or lost along the evolutionary path of the different strains. The extreme plasticity of genome composition is illustrated by the comparison of genomes from three Frankia strains, a class of nitrogen-fixing soil bacteria whose members form symbiotic relationships with actinorhizal plants [3]. It was found that the biggest of the three genomes almost doubles the number of ORFs found in the smallest one, a feature that can be associated with the range of plants each can infect and their geographic locations. A measure of protein diversity among related species can be derived by looking at the pan-genome of the whole group - that is, the pool of genes present in the group collectively, including those that are not present in all individuals. Tettelin et al. [4] sequentially sampled genes from eight Group-B Streptococcus (GBS) genomes and concluded that on the basis of the number of unique genes found in those eight genomes, one should expect to find an average of 30 new genes for every additional GBS genome sequenced. This was an outstanding finding because it implied an infinite number of proteins present in the pan-genome of GBS. When the concept was extended to the Bacteria more widely by analyzing 573 bacterial species using a gene frequency sampling approach [5], the number of expected unique genes per genome increased to an average of about 200 with no sign of leveling off. Results from the comparisons of the 56 genomes in the GEBA project confirm the existence of a surprising number of previously unknown gene families. Wu et al. [1] found that these 56 genomes provided a discovery rate of more than 1,000 novel protein families per genome. By sequencing bacterial genomes from under-represented phyla, they revealed that currently recognized protein diversity is likely to represent only a small fraction of the diversity existing in nature.

Gene trees and genome networks

So how can this meta-genomic structure be modeled realistically when it comes to prokaryotic phylogenetics? Informed by the strong ancestral lineages seen in higher organisms, a tree-like model of evolution was originally extended to include microbial life by modeling sequence evolution in SSU rRNA [6]. This approach provided an early indication of the staggering diversity to be found among microorganisms and led to the classification of life into three domains. However, subsequent analyses of other gene families revealed clear incongruities between gene trees consisting of similar organisms [7]. In the light of evidence from more recent analyses, the once clear lines of the tree model for the history of species and their genomes have become somewhat blurred [8]. Although Wu et al. [1] used a SSU rRNA tree-guided sampling approach, in their initial assessment of the first 56 GEBA genomes they found 1,768 out of 16,797 protein families with no significant sequence similarity to known proteins. Furthermore, when comparing the 53 new bacterial genomes with 53 randomly sampled previously sequenced bacterial genomes, 2.8 to 4.4 times more phylogenetic diversity was observed for a concatenated alignment of 31 broadly conserved protein-coding genes. Anticipating genome content is a difficult problem. Wu et al. [1] demonstrated that the use of a SSU phylogenetic tree as a sampling guide provides a substantial improvement in new information per genome sequenced. Their analysis of 31 broadly conserved protein-coding gene families confirmed the utility of phylogenetic sampling in obtaining a richer sample of protein diversity. While such a tree provides some measure of average protein diversity [9], this average signal does not necessarily represent the history of the genomes. By combining genes with different histories into a single supermatrix, the conflicting phylogenetic signals are likely to lead to artifacts in a tree-only reconstruction. The resulting tree may be dominated by signals due to highways of gene sharing [8] between certain lineages and may not be representative of the history of the organism, its genome, or of a single major cellular component [10]. But how do we reconcile a tree-like relationship between whole organisms with the varied evolutionary history of individual genes found in genomes? One solution offered is to use a combination of genes that we know are more resilient to gene transfers and have a higher likelihood of reflecting the true evolutionary history of the organisms. Examples include genes coding for highly integrated cellular components such as the ribosome or ATP synthases, for which a tree-like history is more likely. Inferred histories of the remaining gene families can be added to provide a more accurate reconstruction of the network-like evolutionary history of genomes [10]. Wu et al. [1] have demonstrated that selecting genomes to sequence on a phylogenetic basis is a far more profitable use of resources in terms of diversity exploration than the previous, less coordinated approach. The GEBA initiative will thus provide the data necessary to answer important questions in microbiology sooner than would otherwise be possible. As the authors anticipate, the final piece of the puzzle will be effective means to sequence genomes from organisms lacking representatives in pure culture. When this is achieved we will be able to approach a complete picture of genomic diversity.

Abbreviations

GBS: Group-B Streptococcus; GEBA: Genomic Encyclopedia of Bacteria and Archaea; ORF: open reading frame; SSU: small subunit.
  10 in total

1.  Highways of gene sharing in prokaryotes.

Authors:  Robert G Beiko; Timothy J Harlow; Mark A Ragan
Journal:  Proc Natl Acad Sci U S A       Date:  2005-09-21       Impact factor: 11.205

Review 2.  Resource-aware taxon selection for maximizing phylogenetic diversity.

Authors:  Fabio Pardi; Nick Goldman
Journal:  Syst Biol       Date:  2007-06       Impact factor: 15.683

3.  Estimating the size of the bacterial pan-genome.

Authors:  Pascal Lapierre; J Peter Gogarten
Journal:  Trends Genet       Date:  2009-01-23       Impact factor: 11.639

4.  Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli.

Authors:  R A Welch; V Burland; G Plunkett; P Redford; P Roesch; D Rasko; E L Buckles; S-R Liou; A Boutin; J Hackett; D Stroud; G F Mayhew; D J Rose; S Zhou; D C Schwartz; N T Perna; H L T Mobley; M S Donnenberg; F R Blattner
Journal:  Proc Natl Acad Sci U S A       Date:  2002-12-05       Impact factor: 11.205

5.  Phylogenetic structure of the prokaryotic domain: the primary kingdoms.

Authors:  C R Woese; G E Fox
Journal:  Proc Natl Acad Sci U S A       Date:  1977-11       Impact factor: 11.205

6.  Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome".

Authors:  Hervé Tettelin; Vega Masignani; Michael J Cieslewicz; Claudio Donati; Duccio Medini; Naomi L Ward; Samuel V Angiuoli; Jonathan Crabtree; Amanda L Jones; A Scott Durkin; Robert T Deboy; Tanja M Davidsen; Marirosa Mora; Maria Scarselli; Immaculada Margarit y Ros; Jeremy D Peterson; Christopher R Hauser; Jaideep P Sundaram; William C Nelson; Ramana Madupu; Lauren M Brinkac; Robert J Dodson; Mary J Rosovitz; Steven A Sullivan; Sean C Daugherty; Daniel H Haft; Jeremy Selengut; Michelle L Gwinn; Liwei Zhou; Nikhat Zafar; Hoda Khouri; Diana Radune; George Dimitrov; Kisha Watkins; Kevin J B O'Connor; Shannon Smith; Teresa R Utterback; Owen White; Craig E Rubens; Guido Grandi; Lawrence C Madoff; Dennis L Kasper; John L Telford; Michael R Wessels; Rino Rappuoli; Claire M Fraser
Journal:  Proc Natl Acad Sci U S A       Date:  2005-09-19       Impact factor: 11.205

7.  Horizontal transfer of ATPase genes--the tree of life becomes a net of life.

Authors:  E Hilario; J P Gogarten
Journal:  Biosystems       Date:  1993       Impact factor: 1.973

8.  Genome characteristics of facultatively symbiotic Frankia sp. strains reflect host range and host plant biogeography.

Authors:  Philippe Normand; Pascal Lapierre; Louis S Tisa; Johann Peter Gogarten; Nicole Alloisio; Emilie Bagnarol; Carla A Bassi; Alison M Berry; Derek M Bickhart; Nathalie Choisne; Arnaud Couloux; Benoit Cournoyer; Stephane Cruveiller; Vincent Daubin; Nadia Demange; Maria Pilar Francino; Eugene Goltsman; Ying Huang; Olga R Kopp; Laurent Labarre; Alla Lapidus; Celine Lavire; Joelle Marechal; Michele Martinez; Juliana E Mastronunzio; Beth C Mullin; James Niemann; Pierre Pujic; Tania Rawnsley; Zoe Rouy; Chantal Schenowitz; Anita Sellstedt; Fernando Tavares; Jeffrey P Tomkins; David Vallenet; Claudio Valverde; Luis G Wall; Ying Wang; Claudine Medigue; David R Benson
Journal:  Genome Res       Date:  2006-12-06       Impact factor: 9.043

Review 9.  Trees in the web of life.

Authors:  Kristen S Swithers; J Peter Gogarten; Gregory P Fournier
Journal:  J Biol       Date:  2009-07-13

10.  A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea.

Authors:  Dongying Wu; Philip Hugenholtz; Konstantinos Mavromatis; Rüdiger Pukall; Eileen Dalin; Natalia N Ivanova; Victor Kunin; Lynne Goodwin; Martin Wu; Brian J Tindall; Sean D Hooper; Amrita Pati; Athanasios Lykidis; Stefan Spring; Iain J Anderson; Patrik D'haeseleer; Adam Zemla; Mitchell Singer; Alla Lapidus; Matt Nolan; Alex Copeland; Cliff Han; Feng Chen; Jan-Fang Cheng; Susan Lucas; Cheryl Kerfeld; Elke Lang; Sabine Gronow; Patrick Chain; David Bruce; Edward M Rubin; Nikos C Kyrpides; Hans-Peter Klenk; Jonathan A Eisen
Journal:  Nature       Date:  2009-12-24       Impact factor: 49.962

  10 in total
  1 in total

1.  Recent events dominate interdomain lateral gene transfers between prokaryotes and eukaryotes and, with the exception of endosymbiotic gene transfers, few ancient transfer events persist.

Authors:  Laura A Katz
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2015-09-26       Impact factor: 6.237

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.