Literature DB >> 18625611

Metagenomics reveals our incomplete knowledge of global diversity.

Miguel Pignatelli, Gabriel Aparicio, Ignacio Blanquer, Vicente Hernández, Andrés Moya, Javier Tamames.   

Abstract

Entities:  

Mesh:

Year:  2008        PMID: 18625611      PMCID: PMC2530889          DOI: 10.1093/bioinformatics/btn355

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


× No keyword cloud information.
Metagenomic sequencing obtains huge amounts of sequences from environmental and clinical samples, thus providing a glimpse of the global prokaryotic diversity of both species and genes in these sources. The current trend in metagenomic analysis follows the so-called gene-centric approach, focused on describing the environments by the study of the functional roles of the proteins encoded in the sequenced genes. In this way, it is clear that metagenomic analysis relies heavily on the accurate knowledge of the universe of proteins stored in the databases. Nevertheless, it is known that some biases exist in the composition of databases (which are rich in sequences from common, cultivable and easily accessible organisms), but it is uncertain how big this bias is and how it can influence the analysis of real data, that is, how accurately the databases are describing the global diversity. In addition to functional assignment of proteins, database completion can also influence greatly the taxonomic classification of metagenomic sequences (binning). Having accurate taxonomic assignments would be essential, since it would help greatly in our understanding of the community dynamics, to predict the effect of changes in their composition, and to study key issues in the evolution of the community, such as the extent of horizontal gene transfer (HGT) or the barriers shaping the species. The analysis of metagenomic sequences without taxonomic assignment will always provide a superficial and incomplete view. But binning is difficult for metagenomic sequences, since they are usually very short and lack enough information to be classified by compositional features and/or phylogenetic analysis (Tamames and Moya, 2008). Although some bioinformatics tools have been proposed for binning, they provide good results only for a reduced fraction of the sequences (Krause et al., 2008; Mavromatis et al., 2007). To deal with the problem, several authors have relied on the assignment of ORFs to the taxon of their closest relatives in a homology search (Tringe et al., 2008; Venter et al., 2004). This is a potentially conflicting strategy, which may often fail for poorly known taxa (as it is often the case for metagenomic samples), and can be easily confounded by HGT events (Koski and Golding, 2001). We have performed a simple experiment to explore: (1) our current knowledge of the universe of sequences, and how this knowledge has evolved in the past years, and (2) the possible extent of failures when taxonomically assigning ORFs to their closest relatives. Using the Blast in Grids (BiG) service (Aparicio et al., 2007), we have run BLASTX searches for several metagenomes against the realease 159 (April 2007) of GenBank non-redundant protein database, extracted the homologues found for each putative ORF [following the procedure described in Tamames and Moya (2008)] and assigned the ORF to the taxon of its best hit (where different threshold of minimum identity have been used). Next, we have collected the dates of creation of the GenBank entries of these hits. In this way, we can simulate the results that we would have obtained in the past, by restricting the list of homologues to those already present in the database in a particular date, which allowed us to explore how the results change as the database grows. The experiment was repeated for different metagenomes and different taxonomic depths. One of the results is shown in the Figure 1A, for a farm soil metagenome (Tringe et al., 2005). Each row corresponds to a single ORF, displaying the colour of the class to which it would have been assigned in different dates. The plot shows clearly that many of the assignments have changed, even in recent times (between 10% and 20% of the assignments at class level have varied in the last 2 years). This can be easily seen in Figure 1B, which shows the accumulated number of changes with respect to previous dates. Several trends are noticeable: (1) The rate of change does not decrease in recent times, instead it clearly increases for most cases. This trend of change can be noticed even for broad taxonomic ranks such as phylum: for instance, Acidobacteria phylum is now recognized as one of the most abundant taxa in many soils (Barns et al., 1999), but until recently no sequences were assigned to it. This indicates that the full diversity of these communities is still not well described in the current databases, and that best hit approach for taxonomic classification is at least risky. (2) Although most changes consist in the assignment of previously unclassified ORFs, the classification for many ORFs has also changed. (3) Abrupt changes occur in response to the availability of complete genomes, especially from close species to those represented in the metagenomes. Again for Acidobacteria, the few assignments correspond to the release of the two unique completed genomes for this taxon, which were sequenced in 2006. This illustrates how strongly genome sequencing is influencing our knowledge of the universe of proteins, and claims for a sustained effort to sequence more genomes from poorly known taxa.
Fig. 1.

(A) Assignment of a set of ORFs from the farm soil metagenome. We analyzed 60 000 randomly selected sequences from the full metagenome, using blastx searches and a threshold of 60% minimum identity between query and hit proteins. Each row in the plot corresponds to a single ORF, showing with colours how the assignment has varied in time (colours indicate different taxonomic classes). (B) This plot shows the accumulated number of ORFs that changed their assignment between consecutive months (expressed as the ratio of number of changes/total number of ORFs), for different metagenomes (Blue, whale fall; Red, human gut; Green, farm soil; Black, Sargasso sea). The changes are divided between new assignments (the ORF was previously unassigned, dashed lines) and assignment changes (assignment to a different taxon, solid lines). The figure also emphasizes three dates in which many assignments change, in accordance to the release of particular complete genomes of importance for the description of these microbial communities.

(A) Assignment of a set of ORFs from the farm soil metagenome. We analyzed 60 000 randomly selected sequences from the full metagenome, using blastx searches and a threshold of 60% minimum identity between query and hit proteins. Each row in the plot corresponds to a single ORF, showing with colours how the assignment has varied in time (colours indicate different taxonomic classes). (B) This plot shows the accumulated number of ORFs that changed their assignment between consecutive months (expressed as the ratio of number of changes/total number of ORFs), for different metagenomes (Blue, whale fall; Red, human gut; Green, farm soil; Black, Sargasso sea). The changes are divided between new assignments (the ORF was previously unassigned, dashed lines) and assignment changes (assignment to a different taxon, solid lines). The figure also emphasizes three dates in which many assignments change, in accordance to the release of particular complete genomes of importance for the description of these microbial communities. We wish that these results could help to understand the constraints that the information currently available in the databases imposes to the analysis of metagenomic data, and to improve the current strategies of metagenomic annotation. Additional plots for different metagenomes and taxonomic ranges can be found in our web page http://metagenomics.uv.es/Supp/BI-2008-metagenomics/suppl.html.
  8 in total

1.  The closest BLAST hit is often not the nearest neighbor.

Authors:  L B Koski; G B Golding
Journal:  J Mol Evol       Date:  2001-06       Impact factor: 2.395

2.  Wide distribution and diversity of members of the bacterial kingdom Acidobacterium in the environment.

Authors:  S M Barns; S L Takala; C R Kuske
Journal:  Appl Environ Microbiol       Date:  1999-04       Impact factor: 4.792

3.  Comparative metagenomics of microbial communities.

Authors:  Susannah Green Tringe; Christian von Mering; Arthur Kobayashi; Asaf A Salamov; Kevin Chen; Hwai W Chang; Mircea Podar; Jay M Short; Eric J Mathur; John C Detter; Peer Bork; Philip Hugenholtz; Edward M Rubin
Journal:  Science       Date:  2005-04-22       Impact factor: 47.728

4.  Use of simulated data sets to evaluate the fidelity of metagenomic processing methods.

Authors:  Konstantinos Mavromatis; Natalia Ivanova; Kerrie Barry; Harris Shapiro; Eugene Goltsman; Alice C McHardy; Isidore Rigoutsos; Asaf Salamov; Frank Korzeniewski; Miriam Land; Alla Lapidus; Igor Grigoriev; Paul Richardson; Philip Hugenholtz; Nikos C Kyrpides
Journal:  Nat Methods       Date:  2007-04-29       Impact factor: 28.547

5.  Environmental genome shotgun sequencing of the Sargasso Sea.

Authors:  J Craig Venter; Karin Remington; John F Heidelberg; Aaron L Halpern; Doug Rusch; Jonathan A Eisen; Dongying Wu; Ian Paulsen; Karen E Nelson; William Nelson; Derrick E Fouts; Samuel Levy; Anthony H Knap; Michael W Lomas; Ken Nealson; Owen White; Jeremy Peterson; Jeff Hoffman; Rachel Parsons; Holly Baden-Tillson; Cynthia Pfannkoch; Yu-Hui Rogers; Hamilton O Smith
Journal:  Science       Date:  2004-03-04       Impact factor: 47.728

6.  The airborne metagenome in an indoor urban environment.

Authors:  Susannah G Tringe; Tao Zhang; Xuguo Liu; Yiting Yu; Wah Heng Lee; Jennifer Yap; Fei Yao; Sim Tiow Suan; Seah Keng Ing; Matthew Haynes; Forest Rohwer; Chia Lin Wei; Patrick Tan; James Bristow; Edward M Rubin; Yijun Ruan
Journal:  PLoS One       Date:  2008-04-02       Impact factor: 3.240

7.  Phylogenetic classification of short environmental DNA fragments.

Authors:  Lutz Krause; Naryttza N Diaz; Alexander Goesmann; Scott Kelley; Tim W Nattkemper; Forest Rohwer; Robert A Edwards; Jens Stoye
Journal:  Nucleic Acids Res       Date:  2008-02-19       Impact factor: 16.971

8.  Estimating the extent of horizontal gene transfer in metagenomic sequences.

Authors:  Javier Tamames; Andrés Moya
Journal:  BMC Genomics       Date:  2008-03-24       Impact factor: 3.969

  8 in total
  18 in total

1.  Size Does Matter: Application-driven Approaches for Soil Metagenomics.

Authors:  Kavita S Kakirde; Larissa C Parsley; Mark R Liles
Journal:  Soil Biol Biochem       Date:  2010-11-01       Impact factor: 7.609

Review 2.  Cultivating the uncultured: limits, advances and future challenges.

Authors:  Karine Alain; Joël Querellou
Journal:  Extremophiles       Date:  2009-06-23       Impact factor: 2.395

Review 3.  Discovering functional novelty in metagenomes: examples from light-mediated processes.

Authors:  Amoolya H Singh; Tobias Doerks; Ivica Letunic; Jeroen Raes; Peer Bork
Journal:  J Bacteriol       Date:  2008-10-10       Impact factor: 3.490

Review 4.  Perspectives on Cultivation Strategies of Archaea.

Authors:  Yihua Sun; Yang Liu; Jie Pan; Fengping Wang; Meng Li
Journal:  Microb Ecol       Date:  2019-08-20       Impact factor: 4.552

5.  Signal processing for metagenomics: extracting information from the soup.

Authors:  Gail L Rosen; Bahrad A Sokhansanj; Robi Polikar; Mary Ann Bruns; Jacob Russell; Elaine Garbarine; Steve Essinger; Non Yok
Journal:  Curr Genomics       Date:  2009-11       Impact factor: 2.236

6.  Metagenomic annotation networks: construction and applications.

Authors:  Gregory Vey; Gabriel Moreno-Hagelsieb
Journal:  PLoS One       Date:  2012-08-07       Impact factor: 3.240

7.  A proteomics approach to decipher the molecular nature of planarian stem cells.

Authors:  Enrique Fernández-Taboada; Gustavo Rodríguez-Esteban; Emili Saló; Josep F Abril
Journal:  BMC Genomics       Date:  2011-02-28       Impact factor: 3.969

8.  Community-wide analysis of microbial genome sequence signatures.

Authors:  Gregory J Dick; Anders F Andersson; Brett J Baker; Sheri L Simmons; Brian C Thomas; A Pepper Yelton; Jillian F Banfield
Journal:  Genome Biol       Date:  2009-08-21       Impact factor: 13.583

9.  Metagenomic guilt by association: an operonic perspective.

Authors:  Gregory Vey
Journal:  PLoS One       Date:  2013-08-06       Impact factor: 3.240

10.  Genome signature-based dissection of human gut metagenomes to extract subliminal viral sequences.

Authors:  Lesley A Ogilvie; Lucas D Bowler; Jonathan Caplin; Cinzia Dedi; David Diston; Elizabeth Cheek; Huw Taylor; James E Ebdon; Brian V Jones
Journal:  Nat Commun       Date:  2013       Impact factor: 14.919

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.