| Literature DB >> 32873755 |
Ignacio de la Higuera1, George W Kasun1, Ellis L Torrance1, Alyssa A Pratt1, Amberlee Maluenda1, Jonathan Colombet2, Maxime Bisseux2, Viviane Ravet2, Anisha Dayaram3, Daisy Stainton4, Simona Kraberger5, Peyman Zawar-Reza6, Sharyn Goldstien7, James V Briskie7, Robyn White7, Helen Taylor8, Christopher Gomez9, David G Ainley10, Jon S Harding7, Rafaela S Fontenele5, Joshua Schreck5, Simone G Ribeiro11, Stephen A Oswald12, Jennifer M Arnold12, François Enault2, Arvind Varsani5,7,13, Kenneth M Stedman14.
Abstract
The discovery of cruciviruses revealed the most explicit example of a common protein homologue between DNA and RNA viruses to date. Cruciviruses are a novel group of circular Rep-encoding single-stranded DNA (ssDNA) (CRESS-DNA) viruses that encode capsid proteins that are most closely related to those encoded by RNA viruses in the family Tombusviridae The apparent chimeric nature of the two core proteins encoded by crucivirus genomes suggests horizontal gene transfer of capsid genes between DNA and RNA viruses. Here, we identified and characterized 451 new crucivirus genomes and 10 capsid-encoding circular genetic elements through de novo assembly and mining of metagenomic data. These genomes are highly diverse, as demonstrated by sequence comparisons and phylogenetic analysis of subsets of the protein sequences they encode. Most of the variation is reflected in the replication-associated protein (Rep) sequences, and much of the sequence diversity appears to be due to recombination. Our results suggest that recombination tends to occur more frequently among groups of cruciviruses with relatively similar capsid proteins and that the exchange of Rep protein domains between cruciviruses is rarer than intergenic recombination. Additionally, we suggest members of the stramenopiles/alveolates/Rhizaria supergroup as possible crucivirus hosts. Altogether, we provide a comprehensive and descriptive characterization of cruciviruses.IMPORTANCE Viruses are the most abundant biological entities on Earth. In addition to their impact on animal and plant health, viruses have important roles in ecosystem dynamics as well as in the evolution of the biosphere. Circular Rep-encoding single-stranded (CRESS) DNA viruses are ubiquitous in nature, many are agriculturally important, and they appear to have multiple origins from prokaryotic plasmids. A subset of CRESS-DNA viruses, the cruciviruses, have homologues of capsid proteins encoded by RNA viruses. The genetic structure of cruciviruses attests to the transfer of capsid genes between disparate groups of viruses. However, the evolutionary history of cruciviruses is still unclear. By collecting and analyzing cruciviral sequence data, we provide a deeper insight into the evolutionary intricacies of cruciviruses. Our results reveal an unexpected diversity of this virus group, with frequent recombination as an important determinant of variability.Entities:
Keywords: CRESS-DNA viruses; crucivirus; environmental virology; gene transfer; recombination; virus evolution
Mesh:
Substances:
Year: 2020 PMID: 32873755 PMCID: PMC7468197 DOI: 10.1128/mBio.01410-20
Source DB: PubMed Journal: mBio Impact factor: 7.867
FIG 1Genome properties of 461 new cruciviral circular sequences. (A) Histogram of cruciviral genome lengths categorized in 50-nt bins. (B) Percentage of G+C content versus A+T in each of the sequences described in this study. (C) Relative abundance of nucleotides in the conserved nonanucleotide sequence of the 211 stem-loops and putative origins of replication represented predicted with StemLoop-Finder (A. A. Pratt et al., unpublished) in Sequence Logo format.
FIG 2Protein conservation in cruciviruses. (A) (Top) Distribution of domains, isoelectric point, and conservation in a consensus capsid protein. Four hundred sixty-one capsid protein sequences were aligned in Geneious 11.0.4 with MAFFT (G-INS-i, BLOSUM 45, open gap penalty 1.53, offset 0.123) and trimmed manually. The conservation of the physicochemical properties at each position was obtained with Jalview v2.11.0 (88), and the isoelectric point was estimated in Geneious 11.0.4. The region of the capsid protein rich in glycine is highlighted with a green bar. (Bottom) Structure of a cruciviral capsid protein (CruV-359) as predicted by Phyre2 showing sequence conservation based on an alignment of the 47 capsid protein sequences from the capsid protein-based clusters. (B) Conserved motifs found in cruciviral Reps after aligning all the extracted Rep protein sequences using PSI-Coffee (94). Sequence logos were generated at http://weblogo.threeplusone.com to indicate the frequency of residues at each position.
FIG 3Diversity of cruciviral proteins. (A) Capsid protein diversity. Pairwise amino acid identity (%PI) between the capsid proteins predicted for 461 cruciviral sequences. The alignment and analysis were carried out with SDT, using the integrated MAFFT algorithm. (B) S-domain diversity. (Left) Pairwise identity matrix between the capsid protein predicted S-domains of the 461 sequences described in this study. The alignment and analysis were carried out with SDT, using the integrated MAFFT algorithm (87). The colored boxes indicate the different clusters of sequences used to create the capsid protein-based cluster sequence subset. (Right) Unrooted phylogenetic tree obtained with FastTree from a manually curated MAFFT alignment of the translated sequences of the S-domain (G-INS-i, BLOSUM 45, open gap penalty 1.53, offset 0.123) (93, 96). The colored branches represent the different clusters observed in the matrix. Scale bar indicates substitutions per site. (C) Rep diversity. (Left) Pairwise identity matrix between all Reps found in cruciviral genomes in this study. The alignment and analysis were carried out with SDT, using the integrated MUSCLE algorithm (87). (Right) Unrooted phylogenetic tree obtained with FastTree from a PSI-Coffee alignment of the translated sequences of Rep trimmed with TrimAl v1.3 (93–96). The colored branches represent the different clusters that contain the Rep-based cluster sequence subset. Scale bar indicates substitutions per site. (D) Pairwise identity frequency distribution. The frequency of pairwise identity values for each of the putative proteins or domains analyzed is shown.
FIG 4Similarity networks of cruciviral proteins with related viruses. (A) Capsid proteins represented by colored dots are connected with a solid line when the pairwise similarity, as assessed by the EFI-EST web server (100), has an E value of <1020. The dashed line represents an E value of 6 × 10−7 between the nodes corresponding to the capsid protein of CruV-523 and turnip crinkle virus, as given by BLASTp. (B) Replication-associated protein (Rep) translations, represented by colored dots, are connected with a solid line when the pairwise similarity has an E value of <10−10. The eight nodes at the bottom left did not connect to any other node. All networks were carried out with pairwise identities calculated in the EFI–EST web server and visualized in Cytoscape v3.7.2 (100, 101).
FIG 5Comparison of phylogenies of capsid and Rep proteins of representative cruciviruses. (A) Tanglegram calculated with Dendroscope v3.5.10 from phylogenetic trees generated with PhyML from capsid protein (PhyML automatic model selection LG+G+I+F) and Rep (PhyML automatic model selection RtREV+G+I) alignments (97, 99). The tips corresponding to the same viral genome are linked by lines that are color coded according to the clusters obtained from Fig. 3A (capsid protein-based clusters). (B) Tanglegram calculated with Dendroscope v3.5.10 from phylogenetic trees generated with PhyML from capsid protein (PhyML automatic model selection LG+G+I+F) and Rep (PhyML automatic model selection RtREV+G+I) alignments (99). The tips corresponding to the same viral sequence are linked by lines that are color coded according to the clusters obtained from Fig. 3B (Rep-based clusters). The clade marked with a red asterisk is formed by members of the red capsid protein-based cluster. Branch support is given according to aLRT SH-like (97). All nodes with an aLRT SH-like branch support inferior to 0.8 were collapsed with Dendroscope prior to constructing the tanglegram.
FIG 6Comparison of phylogenies between the endonuclease and helicase domains of Reps from representative cruciviruses. (A) Tanglegram calculated with Dendroscope v3.5.10 from phylogenetic trees generated with PhyML from separate alignments of Rep endonuclease and helicase domains (97, 99). The tips corresponding to the same viral genome are linked by lines that are color coded according to the clusters obtained from Fig. 3A (capsid protein-based clusters). (B) Same as panel A but with sequences from the clusters obtained from Fig. 3B (Rep-based clusters). All nodes with an aLRT SH-like branch support inferior to 0.8 were collapsed with Dendroscope v3.5.10 prior to constructing the tanglegram (99).