| Literature DB >> 34912116 |
Luis Pedro Coelho1,2,3, Renato Alves4, Álvaro Rodríguez Del Río5, Pernille Neve Myers6, Carlos P Cantalapiedra5, Joaquín Giner-Lamia5,7, Thomas Sebastian Schmidt4, Daniel R Mende4,8, Askarbek Orakov4, Ivica Letunic9, Falk Hildebrand4,10,11, Thea Van Rossum4, Sofia K Forslund4,12,13, Supriya Khedkar4, Oleksandr M Maistrenko4, Shaojun Pan14,15, Longhao Jia14,15, Pamela Ferretti4, Shinichi Sunagawa4,16, Xing-Ming Zhao14,15, Henrik Bjørn Nielsen17, Jaime Huerta-Cepas18,19, Peer Bork20,21,22,23.
Abstract
Microbial genes encode the majority of the functional repertoire of life on earth. However, despite increasing efforts in metagenomic sequencing of various habitats1-3, little is known about the distribution of genes across the global biosphere, with implications for human and planetary health. Here we constructed a non-redundant gene catalogue of 303 million species-level genes (clustered at 95% nucleotide identity) from 13,174 publicly available metagenomes across 14 major habitats and use it to show that most genes are specific to a single habitat. The small fraction of genes found in multiple habitats is enriched in antibiotic-resistance genes and markers for mobile genetic elements. By further clustering these species-level genes into 32 million protein families, we observed that a small fraction of these families contain the majority of the genes (0.6% of families account for 50% of the genes). The majority of species-level genes and protein families are rare. Furthermore, species-level genes, and in particular the rare ones, show low rates of positive (adaptive) selection, supporting a model in which most genetic variability observed within each protein family is neutral or nearly neutral.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34912116 PMCID: PMC7613196 DOI: 10.1038/s41586-021-04233-4
Source DB: PubMed Journal: Nature ISSN: 0028-0836 Impact factor: 69.504