Literature DB >> 35356891

Unifying the known and unknown microbial coding sequence space.

Chiara Vanni1,2, Matthew S Schechter1,3, Silvia G Acinas4, Albert Barberán5, Pier Luigi Buttigieg6, Emilio O Casamayor7, Tom O Delmont8, Carlos M Duarte9, A Murat Eren3,10, Robert D Finn11, Renzo Kottmann1, Alex Mitchell11, Pablo Sánchez4, Kimmo Siren12, Martin Steinegger13,14, Frank Oliver Gloeckner2,15,16, Antonio Fernàndez-Guerra1,17.   

Abstract

Genes of unknown function are among the biggest challenges in molecular biology, especially in microbial systems, where 40-60% of the predicted genes are unknown. Despite previous attempts, systematic approaches to include the unknown fraction into analytical workflows are still lacking. Here, we present a conceptual framework, its translation into the computational workflow AGNOSTOS and a demonstration on how we can bridge the known-unknown gap in genomes and metagenomes. By analyzing 415,971,742 genes predicted from 1749 metagenomes and 28,941 bacterial and archaeal genomes, we quantify the extent of the unknown fraction, its diversity, and its relevance across multiple organisms and environments. The unknown sequence space is exceptionally diverse, phylogenetically more conserved than the known fraction and predominantly taxonomically restricted at the species level. From the 71 M genes identified to be of unknown function, we compiled a collection of 283,874 lineage-specific genes of unknown function for Cand. Patescibacteria (also known as Candidate Phyla Radiation, CPR), which provides a significant resource to expand our understanding of their unusual biology. Finally, by identifying a target gene of unknown function for antibiotic resistance, we demonstrate how we can enable the generation of hypotheses that can be used to augment experimental data.
© 2022, Vanni et al.

Entities:  

Keywords:  bioinformatics; computational biology; functional metageomics; gene clusters; infectious disease; microbial genomics; microbiology; phylogenomics; systems biology; unknown function

Mesh:

Year:  2022        PMID: 35356891      PMCID: PMC9132574          DOI: 10.7554/eLife.67667

Source DB:  PubMed          Journal:  Elife        ISSN: 2050-084X            Impact factor:   8.713


  117 in total

1.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors:  Weizhong Li; Adam Godzik
Journal:  Bioinformatics       Date:  2006-05-26       Impact factor: 6.937

Review 2.  Present and Future of Culturing Bacteria.

Authors:  Jörg Overmann; Birte Abt; Johannes Sikorski
Journal:  Annu Rev Microbiol       Date:  2017-07-21       Impact factor: 15.500

Review 3.  Introducing protein intrinsic disorder.

Authors:  Johnny Habchi; Peter Tompa; Sonia Longhi; Vladimir N Uversky
Journal:  Chem Rev       Date:  2014-04-17       Impact factor: 60.622

4.  Rapid discovery of novel prophages using biological feature engineering and machine learning.

Authors:  Kimmo Sirén; Andrew Millard; Bent Petersen; M Thomas P Gilbert; Martha R J Clokie; Thomas Sicheritz-Pontén
Journal:  NAR Genom Bioinform       Date:  2021-01-06

5.  Exploration of uncharted regions of the protein universe.

Authors:  Lukasz Jaroszewski; Zhanwen Li; S Sri Krishna; Constantina Bakolitsa; John Wooley; Ashley M Deacon; Ian A Wilson; Adam Godzik
Journal:  PLoS Biol       Date:  2009-09-29       Impact factor: 8.029

6.  DUFs: families in search of function.

Authors:  Alex Bateman; Penny Coggill; Robert D Finn
Journal:  Acta Crystallogr Sect F Struct Biol Cryst Commun       Date:  2010-03-05

7.  AntiFam: a tool to help identify spurious ORFs in protein annotation.

Authors:  Ruth Y Eberhardt; Daniel H Haft; Marco Punta; Maria Martin; Claire O'Donovan; Alex Bateman
Journal:  Database (Oxford)       Date:  2012-03-20       Impact factor: 3.451

8.  Next-generation genome annotation: we still struggle to get it right.

Authors:  Steven L Salzberg
Journal:  Genome Biol       Date:  2019-05-16       Impact factor: 13.583

9.  Gene Expression Changes and Community Turnover Differentially Shape the Global Ocean Metatranscriptome.

Authors:  Guillem Salazar; Lucas Paoli; Adriana Alberti; Jaime Huerta-Cepas; Hans-Joachim Ruscheweyh; Miguelangel Cuenca; Christopher M Field; Luis Pedro Coelho; Corinne Cruaud; Stefan Engelen; Ann C Gregory; Karine Labadie; Claudie Marec; Eric Pelletier; Marta Royo-Llonch; Simon Roux; Pablo Sánchez; Hideya Uehara; Ahmed A Zayed; Georg Zeller; Margaux Carmichael; Céline Dimier; Joannie Ferland; Stefanie Kandels; Marc Picheral; Sergey Pisarev; Julie Poulain; Silvia G Acinas; Marcel Babin; Peer Bork; Chris Bowler; Colomban de Vargas; Lionel Guidi; Pascal Hingamp; Daniele Iudicone; Lee Karp-Boss; Eric Karsenti; Hiroyuki Ogata; Stephane Pesant; Sabrina Speich; Matthew B Sullivan; Patrick Wincker; Shinichi Sunagawa
Journal:  Cell       Date:  2019-11-14       Impact factor: 41.582

10.  LEON-BIS: multiple alignment evaluation of sequence neighbours using a Bayesian inference system.

Authors:  Renaud Vanhoutreve; Arnaud Kress; Baptiste Legrand; Hélène Gass; Olivier Poch; Julie D Thompson
Journal:  BMC Bioinformatics       Date:  2016-07-07       Impact factor: 3.169

View more
  1 in total

1.  A roadmap for the functional annotation of protein families: a community perspective.

Authors:  Valérie de Crécy-Lagard; Rocio Amorin de Hegedus; Cecilia Arighi; Jill Babor; Alex Bateman; Ian Blaby; Crysten Blaby-Haas; Alan J Bridge; Stephen K Burley; Stacey Cleveland; Lucy J Colwell; Ana Conesa; Christian Dallago; Antoine Danchin; Anita de Waard; Adam Deutschbauer; Raquel Dias; Yousong Ding; Gang Fang; Iddo Friedberg; John Gerlt; Joshua Goldford; Mark Gorelik; Benjamin M Gyori; Christopher Henry; Geoffrey Hutinet; Marshall Jaroch; Peter D Karp; Liudmyla Kondratova; Zhiyong Lu; Aron Marchler-Bauer; Maria-Jesus Martin; Claire McWhite; Gaurav D Moghe; Paul Monaghan; Anne Morgat; Christopher J Mungall; Darren A Natale; William C Nelson; Seán O'Donoghue; Christine Orengo; Katherine H O'Toole; Predrag Radivojac; Colbie Reed; Richard J Roberts; Dmitri Rodionov; Irina A Rodionova; Jeffrey D Rudolf; Lana Saleh; Gloria Sheynkman; Francoise Thibaud-Nissen; Paul D Thomas; Peter Uetz; David Vallenet; Erica Watson Carter; Peter R Weigele; Valerie Wood; Elisha M Wood-Charlson; Jin Xu
Journal:  Database (Oxford)       Date:  2022-08-12       Impact factor: 4.462

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.