Literature DB >> 31164461

Composite Metagenome-Assembled Genomes Reduce the Quality of Public Genome Repositories.

Alon Shaiber1, A Murat Eren2,3.   

Abstract

Entities:  

Mesh:

Year:  2019        PMID: 31164461      PMCID: PMC6550520          DOI: 10.1128/mBio.00725-19

Source DB:  PubMed          Journal:  mBio            Impact factor:   7.867


× No keyword cloud information.

LETTER

In their recent study, Espinoza et al. employ genome-resolved metagenomics to investigate supragingival plaque metagenomes of 88 individuals (1). The 34 metagenome-assembled genomes (MAGs) that the authors report include those that resolve to clades that have largely evaded cultivation efforts, such as Gracilibacteria (formerly GN02) and Saccharibacteria (formerly TM7) of the recently described Candidate Phyla Radiation (2). Generating new genomic insights into the understudied members of the human oral cavity is of critical importance for a comprehensive understanding of the microbial ecology and functioning of this biome, and we acknowledge the contribution of the authors on this front. However, the redundant occurrence of bacterial single-copy core genes suggests that more than half of the MAGs that Espinoza et al. report are composite genomes that do not meet the recent quality guidelines suggested by the community (3). Composite genomes that aggregate sequences originating from multiple distinct populations can yield misleading insights when treated and reported as single genomes (4). To briefly demonstrate their composite nature, we refined some of the key Espinoza et al. MAGs through a previously described approach (5) and the data that the authors kindly provided (1). We found that MAG IV.A, MAG IV.B, and MAG III.A described multiple discrete populations with distinct distribution patterns across individuals (Fig. 1). A phylogenomic analysis of refined MAG IV.A genomes resolved to the candidate phylum Absconditabacteria (formerly SR1) and not to Gracilibacteria as reported by Espinoza et al. (Fig. 1D). A pangenomic analysis of the original and refined MAG III.A genomes with other publicly available Saccharibacteria genomes showed a 7-fold increase in the number of single-copy core genes (Fig. 1E). These findings demonstrate the potential implications of composite MAGs in comparative genomics studies where single-copy core genes are commonly used to infer diversity, phylogeny, and taxonomy (6). Composite MAGs can also lead to inaccurate ecological insights through inflated abundance and prevalence estimates. For instance, the original MAG III.A recruited a total of 1,849,593 reads from Espinoza et al. metagenomes; however, the most abundant refined III.A genome (MAG III.A.2, Fig. 1C) recruited only 629,291 reads.
FIG 1

Refinement of three composite genome bins. (A to C) The top left corners of these panels display the original name of a given Espinoza et al. MAG (see Table 1 in the original study) and its estimated completion and redundancy (C/R) based on a bacterial single-copy core gene collection (10). Each concentric circle represents one of the 88 metagenomes in the original study, dendrograms show hierarchical clustering of contigs based on sequence composition and differential mean coverage across metagenomes (using Euclidean distance and Ward’s method), and each data point represents the read recruitment statistic of a given contig in a given metagenome. Arcs at the outermost layers mark contigs that belong to a refined bin along with their new completion and redundancy estimates (C/R). (D) The phylogenomic tree organizes genomes based on 37 concatenated ribosomal proteins. Coloring of genome names matches their taxonomy in NCBI, and branch colors match the consensus taxonomy of genomes they represent. Espinoza et al. reported MAG IV.A as Gracilibacteria (hence the red color); however, this phylogenomic analysis places refined MAGs under Absconditabacteria. (E) Pangenomic analysis of Espinoza et al. Saccharibacteria MAG III.A before (left) and after (right) refinement together with the Saccharibacteria genomes from panel D. Pangenomes describe 575 and 497 gene clusters, respectively, where each concentric circle represents a genome and bars correspond to the number of genes that a given genome is contributing to a given gene cluster (the maximum value is set to 2 for readability). Outermost layers mark single-copy core gene clusters to which every genome contributes precisely a single gene. We used Bowtie2 (11) to recruit reads from metagenomes, and anvi’o (12) to visualize and refine Espinoza et al. MAGs. FAMSA (13) aligned anvi’o-reported ribosomal protein amino acid sequences, trimAl (14) curated them, and IQ-TREE (15) computed the tree for the phylogenomic analysis. Anvi’o used DIAMOND (16) and MCL (17) algorithms to determine pangenomes. A reproducible bioinformatics workflow and FASTA files for refined MAGs are available at http://merenlab.org/data/refining-espinoza-mags.

Refinement of three composite genome bins. (A to C) The top left corners of these panels display the original name of a given Espinoza et al. MAG (see Table 1 in the original study) and its estimated completion and redundancy (C/R) based on a bacterial single-copy core gene collection (10). Each concentric circle represents one of the 88 metagenomes in the original study, dendrograms show hierarchical clustering of contigs based on sequence composition and differential mean coverage across metagenomes (using Euclidean distance and Ward’s method), and each data point represents the read recruitment statistic of a given contig in a given metagenome. Arcs at the outermost layers mark contigs that belong to a refined bin along with their new completion and redundancy estimates (C/R). (D) The phylogenomic tree organizes genomes based on 37 concatenated ribosomal proteins. Coloring of genome names matches their taxonomy in NCBI, and branch colors match the consensus taxonomy of genomes they represent. Espinoza et al. reported MAG IV.A as Gracilibacteria (hence the red color); however, this phylogenomic analysis places refined MAGs under Absconditabacteria. (E) Pangenomic analysis of Espinoza et al. Saccharibacteria MAG III.A before (left) and after (right) refinement together with the Saccharibacteria genomes from panel D. Pangenomes describe 575 and 497 gene clusters, respectively, where each concentric circle represents a genome and bars correspond to the number of genes that a given genome is contributing to a given gene cluster (the maximum value is set to 2 for readability). Outermost layers mark single-copy core gene clusters to which every genome contributes precisely a single gene. We used Bowtie2 (11) to recruit reads from metagenomes, and anvi’o (12) to visualize and refine Espinoza et al. MAGs. FAMSA (13) aligned anvi’o-reported ribosomal protein amino acid sequences, trimAl (14) curated them, and IQ-TREE (15) computed the tree for the phylogenomic analysis. Anvi’o used DIAMOND (16) and MCL (17) algorithms to determine pangenomes. A reproducible bioinformatics workflow and FASTA files for refined MAGs are available at http://merenlab.org/data/refining-espinoza-mags. Co-assembly of a large number of metagenomes that contain very closely related populations often hinders confident assignments of shared contigs into individual bins. Nevertheless, even when proper refinement is not possible, reporting composite MAGs as single genomes should be avoided. As of today, highly composite Espinoza et al. MAGs (Fig. 1 in this letter and Table 1 in the work of Espinoza et al.) are available as single genomes in public databases of the National Center for Biotechnology Information (NCBI). The rapidly increasing number of MAGs in public databases already competes with the total number of microbial isolate genomes (3), and increasingly frequent studies that report large collections of MAGs offer a glimpse of the future (7–9). Despite their growing availability, metagenomes are inherently complex and demand researchers to orchestrate an intricate combination of rapidly evolving computational tools and approaches with many alternatives to reconstruct, characterize, and finalize MAGs. We must continue to champion studies such as the one by Espinoza et al. for their contribution to our collective effort to shed light on the darker branches of the ever-growing Tree of Life. At the same time, editors and reviewers of genome-resolved metagenomics studies should properly scrutinize the quality and accuracy of MAGs prior to their publication. A systematic failure at this will reduce the quality of public genome repositories while yielding adverse effects such as misleading insights into novel microbial groups and reduced trust among scientists in findings that emerge from genome-resolved metagenomics.
  17 in total

1.  Using MCL to extract clusters from networks.

Authors:  Stijn van Dongen; Cei Abreu-Goodger
Journal:  Methods Mol Biol       Date:  2012

2.  UGA is an additional glycine codon in uncultured SR1 bacteria from the human microbiota.

Authors:  James H Campbell; Patrick O'Donoghue; Alisha G Campbell; Patrick Schwientek; Alexander Sczyrba; Tanja Woyke; Dieter Söll; Mircea Podar
Journal:  Proc Natl Acad Sci U S A       Date:  2013-03-18       Impact factor: 11.205

3.  Fast gapped-read alignment with Bowtie 2.

Authors:  Ben Langmead; Steven L Salzberg
Journal:  Nat Methods       Date:  2012-03-04       Impact factor: 28.547

4.  Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life.

Authors:  Donovan H Parks; Christian Rinke; Maria Chuvochina; Pierre-Alain Chaumeil; Ben J Woodcroft; Paul N Evans; Philip Hugenholtz; Gene W Tyson
Journal:  Nat Microbiol       Date:  2017-09-11       Impact factor: 17.745

5.  No evidence for extensive horizontal gene transfer in the genome of the tardigrade Hypsibius dujardini.

Authors:  Georgios Koutsovoulos; Sujai Kumar; Dominik R Laetsch; Lewis Stevens; Jennifer Daub; Claire Conlon; Habib Maroon; Fran Thomas; Aziz A Aboobaker; Mark Blaxter
Journal:  Proc Natl Acad Sci U S A       Date:  2016-03-24       Impact factor: 11.205

6.  IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies.

Authors:  Lam-Tung Nguyen; Heiko A Schmidt; Arndt von Haeseler; Bui Quang Minh
Journal:  Mol Biol Evol       Date:  2014-11-03       Impact factor: 16.240

7.  Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea.

Authors:  Robert M Bowers; Nikos C Kyrpides; Ramunas Stepanauskas; Miranda Harmon-Smith; Devin Doud; T B K Reddy; Frederik Schulz; Jessica Jarett; Adam R Rivers; Emiley A Eloe-Fadrosh; Susannah G Tringe; Natalia N Ivanova; Alex Copeland; Alicia Clum; Eric D Becraft; Rex R Malmstrom; Bruce Birren; Mircea Podar; Peer Bork; George M Weinstock; George M Garrity; Jeremy A Dodsworth; Shibu Yooseph; Granger Sutton; Frank O Glöckner; Jack A Gilbert; William C Nelson; Steven J Hallam; Sean P Jungbluth; Thijs J G Ettema; Scott Tighe; Konstantinos T Konstantinidis; Wen-Tso Liu; Brett J Baker; Thomas Rattei; Jonathan A Eisen; Brian Hedlund; Katherine D McMahon; Noah Fierer; Rob Knight; Rob Finn; Guy Cochrane; Ilene Karsch-Mizrachi; Gene W Tyson; Christian Rinke; Alla Lapidus; Folker Meyer; Pelin Yilmaz; Donovan H Parks; A M Eren; Lynn Schriml; Jillian F Banfield; Philip Hugenholtz; Tanja Woyke
Journal:  Nat Biotechnol       Date:  2017-08-08       Impact factor: 54.908

8.  Identifying contamination with advanced visualization and analysis practices: metagenomic approaches for eukaryotic genome assemblies.

Authors:  Tom O Delmont; A Murat Eren
Journal:  PeerJ       Date:  2016-03-29       Impact factor: 2.984

9.  Supragingival Plaque Microbiome Ecology and Functional Potential in the Context of Health and Disease.

Authors:  Josh L Espinoza; Derek M Harkins; Manolito Torralba; Andres Gomez; Sarah K Highlander; Marcus B Jones; Pamela Leong; Richard Saffery; Michelle Bockmann; Claire Kuelbs; Jason M Inman; Toby Hughes; Jeffrey M Craig; Karen E Nelson; Chris L Dupont
Journal:  MBio       Date:  2018-11-27       Impact factor: 7.867

10.  Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle.

Authors:  Edoardo Pasolli; Francesco Asnicar; Serena Manara; Moreno Zolfo; Nicolai Karcher; Federica Armanini; Francesco Beghini; Paolo Manghi; Adrian Tett; Paolo Ghensi; Maria Carmen Collado; Benjamin L Rice; Casey DuLong; Xochitl C Morgan; Christopher D Golden; Christopher Quince; Curtis Huttenhower; Nicola Segata
Journal:  Cell       Date:  2019-01-17       Impact factor: 41.582

View more
  27 in total

1.  Niche dimensions of a marine bacterium are identified using invasion studies in coastal seawater.

Authors:  Brent Nowinski; Mary Ann Moran
Journal:  Nat Microbiol       Date:  2021-01-25       Impact factor: 17.745

Review 2.  Diversity within species: interpreting strains in microbiomes.

Authors:  Thea Van Rossum; Pamela Ferretti; Oleksandr M Maistrenko; Peer Bork
Journal:  Nat Rev Microbiol       Date:  2020-06-04       Impact factor: 60.633

3.  Large-scale quality assessment of prokaryotic genomes with metashot/prok-quality.

Authors:  Davide Albanese; Claudio Donati
Journal:  F1000Res       Date:  2021-08-17

Review 4.  Candidate Phyla Radiation, an Underappreciated Division of the Human Microbiome, and Its Impact on Health and Disease.

Authors:  Sabrina Naud; Ahmad Ibrahim; Camille Valles; Mohamad Maatouk; Fadi Bittar; Maryam Tidjani Alou; Didier Raoult
Journal:  Clin Microbiol Rev       Date:  2022-06-06       Impact factor: 50.129

5.  Toward quantifying the adaptive role of bacterial pangenomes during environmental perturbations.

Authors:  Roth E Conrad; Tomeu Viver; Juan F Gago; Janet K Hatt; Stephanus N Venter; Ramon Rossello-Mora; Konstantinos T Konstantinidis
Journal:  ISME J       Date:  2021-12-09       Impact factor: 11.217

6.  MetaPlatanus: a metagenome assembler that combines long-range sequence links and species-specific features.

Authors:  Rei Kajitani; Hideki Noguchi; Yasuhiro Gotoh; Yoshitoshi Ogura; Dai Yoshimura; Miki Okuno; Atsushi Toyoda; Tomomi Kuwahara; Tetsuya Hayashi; Takehiko Itoh
Journal:  Nucleic Acids Res       Date:  2021-12-16       Impact factor: 16.971

7.  To Dereplicate or Not To Dereplicate?

Authors:  Jacob T Evans; Vincent J Denef
Journal:  mSphere       Date:  2020-05-20       Impact factor: 4.389

Review 8.  Metagenomics: a path to understanding the gut microbiome.

Authors:  Sandi Yen; Jethro S Johnson
Journal:  Mamm Genome       Date:  2021-07-14       Impact factor: 2.957

9.  Deadwood-Inhabiting Bacteria Show Adaptations to Changing Carbon and Nitrogen Availability During Decomposition.

Authors:  Vojtěch Tláskal; Petr Baldrian
Journal:  Front Microbiol       Date:  2021-06-17       Impact factor: 5.640

Review 10.  Profiling of Oral Bacterial Communities.

Authors:  W G Wade; E M Prosdocimi
Journal:  J Dent Res       Date:  2020-04-14       Impact factor: 6.116

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.