Literature DB >> 35950839

The complexity landscape of viral genomes.

Jorge Miguel Silva1, Diogo Pratas1,2,3, Tânia Caetano4, Sérgio Matos1,2.   

Abstract

BACKGROUND: Viruses are among the shortest yet highly abundant species that harbor minimal instructions to infect cells, adapt, multiply, and exist. However, with the current substantial availability of viral genome sequences, the scientific repertory lacks a complexity landscape that automatically enlights viral genomes' organization, relation, and fundamental characteristics.
RESULTS: This work provides a comprehensive landscape of the viral genome's complexity (or quantity of information), identifying the most redundant and complex groups regarding their genome sequence while providing their distribution and characteristics at a large and local scale. Moreover, we identify and quantify inverted repeats abundance in viral genomes. For this purpose, we measure the sequence complexity of each available viral genome using data compression, demonstrating that adequate data compressors can efficiently quantify the complexity of viral genome sequences, including subsequences better represented by algorithmic sources (e.g., inverted repeats). Using a state-of-the-art genomic compressor on an extensive viral genomes database, we show that double-stranded DNA viruses are, on average, the most redundant viruses while single-stranded DNA viruses are the least. Contrarily, double-stranded RNA viruses show a lower redundancy relative to single-stranded RNA. Furthermore, we extend the ability of data compressors to quantify local complexity (or information content) in viral genomes using complexity profiles, unprecedently providing a direct complexity analysis of human herpesviruses. We also conceive a features-based classification methodology that can accurately distinguish viral genomes at different taxonomic levels without direct comparisons between sequences. This methodology combines data compression with simple measures such as GC-content percentage and sequence length, followed by machine learning classifiers.
CONCLUSIONS: This article presents methodologies and findings that are highly relevant for understanding the patterns of similarity and singularity between viral groups, opening new frontiers for studying viral genomes' organization while depicting the complexity trends and classification components of these genomes at different taxonomic levels. The whole study is supported by an extensive website (https://asilab.github.io/canvas/) for comprehending the viral genome characterization using dynamic and interactive approaches.
© The Author(s) 2022. Published by Oxford University Press GigaScience.

Entities:  

Keywords:  algorithmic information theory; cladograms; data compression; genomics; sequence analysis; viral classification; viruses

Mesh:

Year:  2022        PMID: 35950839      PMCID: PMC9366995          DOI: 10.1093/gigascience/giac079

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   7.658


  75 in total

1.  Evolutionary relationships among diverse bacteriophages and prophages: all the world's a phage.

Authors:  R W Hendrix; M C Smith; R N Burns; M E Ford; G F Hatfull
Journal:  Proc Natl Acad Sci U S A       Date:  1999-03-02       Impact factor: 11.205

2.  Introns and the origin of nucleus-cytosol compartmentalization.

Authors:  William Martin; Eugene V Koonin
Journal:  Nature       Date:  2006-03-02       Impact factor: 49.962

3.  Origin of the cell nucleus, mitosis and sex: roles of intracellular coevolution.

Authors:  Thomas Cavalier-Smith
Journal:  Biol Direct       Date:  2010-02-04       Impact factor: 4.540

4.  Efficient DNA sequence compression with neural networks.

Authors:  Milton Silva; Diogo Pratas; Armando J Pinho
Journal:  Gigascience       Date:  2020-11-11       Impact factor: 6.524

5.  A genome alignment algorithm based on compression.

Authors:  Minh Duc Cao; Trevor I Dix; Lloyd Allison
Journal:  BMC Bioinformatics       Date:  2010-12-16       Impact factor: 3.169

6.  Base-stacking and base-pairing contributions into thermal stability of the DNA double helix.

Authors:  Peter Yakovchuk; Ekaterina Protozanova; Maxim D Frank-Kamenetskii
Journal:  Nucleic Acids Res       Date:  2006-01-31       Impact factor: 16.971

Review 7.  Mechanisms of viral mutation.

Authors:  Rafael Sanjuán; Pilar Domingo-Calap
Journal:  Cell Mol Life Sci       Date:  2016-07-08       Impact factor: 9.261

8.  Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer.

Authors:  Qian Zhang; Se-Ran Jun; Michael Leuze; David Ussery; Intawat Nookaew
Journal:  Sci Rep       Date:  2017-01-19       Impact factor: 4.379

9.  DNA sequences at a glance.

Authors:  Armando J Pinho; Sara P Garcia; Diogo Pratas; Paulo J S G Ferreira
Journal:  PLoS One       Date:  2013-11-21       Impact factor: 3.240

View more
  1 in total

1.  The complexity landscape of viral genomes.

Authors:  Jorge Miguel Silva; Diogo Pratas; Tânia Caetano; Sérgio Matos
Journal:  Gigascience       Date:  2022-08-11       Impact factor: 7.658

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.