| Literature DB >> 35790962 |
Raphaël Méheust1,2,3, Cindy J Castelle4,5, Alexander L Jaffe6, Jillian F Banfield7,8,9,10.
Abstract
BACKGROUND: Archaea play fundamental roles in the environment, for example by methane production and consumption, ammonia oxidation, protein degradation, carbon compound turnover, and sulfur compound transformations. Recent genomic analyses have profoundly reshaped our understanding of the distribution and functionalities of Archaea and their roles in eukaryotic evolution.Entities:
Keywords: Archaea; Bioinformatics; Comparative genomics; Protein family
Mesh:
Year: 2022 PMID: 35790962 PMCID: PMC9258230 DOI: 10.1186/s12915-022-01348-6
Source DB: PubMed Journal: BMC Biol ISSN: 1741-7007 Impact factor: 7.364
Fig. 1The distribution of the 10,866 families across the 1179 representative genomes. A The distribution of 10,866 widely distributed protein families (columns) in 1179 representative genomes (rows) from Archaea. Data are clustered based on the presence (black) and absence (white) profiles (Jaccard distance, complete linkage). B Tree resulting from the hierarchical clustering of the genomes based on the distributions of protein families in A
Fig. 2The distribution of the 2632 families of the 19 modules discussed in this study. Each column represents a protein family and each row represents a genome. Data are clustered based on the presence (black)/absence (white) profiles but also based on the taxonomy of the genomes and the module membership. The first colored top bar (annotations) shows the families with (black)/without (white) a predicted annotation whereas the second colored top bar (modules) indicates the module of each family. The colored side bar indicates the taxonomic assignment of each genome
A list of the fourteen modules that are lineage specific but also well conserved within eleven major archaeal lineages. A family was counted as having a signal peptide if at least 25% of its protein sequences were predicted to have a signal peptide prediction according to the SignalP software [32]. A family was counted as having a transmembrane helix if more than half of its protein sequences were predicted to have a transmembrane helix according to the TMHMM software [33]. Families were considered hypothetical if they have neither PFAM (Domain of Unknown Function domains were excluded) nor KEGG annotations (see the supplementary dataset - Table S3 for the full list of hypothetical families). Finally, a family was considered to have bacterial homologs if the family matched with protein sequences of at least ten distinct bacterial genomes (see the “Methods” section). The core module 1 is included as a comparison
Fig. 3Schematic overview of integrin-like and TFIIH-like gene clusters identified in archaea. A Conserved gene clusters comprising archaeal integrin-like genes (fam15271 ) identified in five Asgard genomes. B Conserved gene clusters comprising archaeal TFIIH-like genes (fam18955) identified in three Theionarchaea and three Asgard genomes. A full gene synteny and genomic context of the genes neighboring the integrin-like (fam15271) and TFIIH-like (fam18955) genes is available in Additional file 1: Table S8