| Literature DB >> 32957679 |
Alessandro Rossi1, Laura Treu1, Stefano Toppo2, Henrike Zschach3, Stefano Campanaro1,4, Bas E Dutilh5.
Abstract
crAss-like viruses are a putative family of bacteriophages recently discovered. The eponym of the clade, crAssphage, is an enteric bacteriophage estimated to be present in at least half of the human population and it constitutes up to 90% of the sequences in some human fecal viral metagenomic datasets. We focused on the evolutionary dynamics of the genes encoded on the crAssphage genome. By investigating the conservation of the genes, a consistent variation in the evolutionary rates across the different functional groups was found. Gene duplications in crAss-like genomes were detected. By exploring the differences among the functional categories of the genes, we confirmed that the genes encoding capsid proteins were the most ubiquitous, despite their overall low sequence conservation. It was possible to identify a core of proteins whose evolutionary trees strongly correlate with each other, suggesting their genetic interaction. This group includes the capsid proteins, which are thus established as extremely suitable for rebuilding the phylogenetic tree of this viral clade. A negative correlation between the ubiquity and the conservation of viral protein sequences was shown. Together, this study provides an in-depth picture of the evolution of different genes in crAss-like viruses.Entities:
Keywords: crAssphage; gene evolution; human gut; metaviromics; mirrortree
Mesh:
Substances:
Year: 2020 PMID: 32957679 PMCID: PMC7551546 DOI: 10.3390/v12091035
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
Figure 1Number of sequences in each group of protein homologs (cluster size) according to functional categories. The three color bars refer to the different similarity thresholds applied to filter alignments (50%, 80%, and 95% respectively). The capsid proteins are the most frequently present in the crAss-like contigs. See Table S2 for the names of all homologous groups of proteins and values for all statistics.
Figure 2Correlation between the logarithm of the number of sequences in a protein family and the Shannon information content of the positions in the protein sequence alignment. Inverse correlations were obtained both for crAss-like viruses and pVOGs. For proteins in crAss-like viruses, only functional classes having more than four genes are reported.
Figure 3Mirrortree algorithm applied to the homologous groups of proteins. Histogram representing the distribution of the pairwise Pearson’s r coefficients. A great number of genes appear to be coevolving. Heatmap of the Pearson correlation coefficient of each protein with any other. Along the x and y axis the 92 ORFs identified in the reference crAssphage genome are represented. The interactions between clusters sharing less than five sequences were colored in white, in order to avoid confusion due to a low size. Histogram representing the average correlation coefficient of every protein represented in their position on the reference genome. The different colors represent the six functional groups.