| Literature DB >> 34718745 |
Shuai Jiang1,2, Qiang Du1,2,3, Changrui Feng1,2,3, Lina Ma1,2,3, Zhang Zhang1,2,3.
Abstract
Sequence compositions of nucleic acids and proteins have significant impact on gene expression, RNA stability, translation efficiency, RNA/protein structure and molecular function, and are associated with genome evolution and adaptation across all kingdoms of life. Therefore, a devoted resource of sequence compositions and associated features is fundamentally crucial for a wide range of biological research. Here, we present CompoDynamics (https://ngdc.cncb.ac.cn/compodynamics/), a comprehensive database of sequence compositions of coding sequences (CDSs) and genomes for all kinds of species. Taking advantage of the exponential growth of RefSeq data, CompoDynamics presents a wealth of sequence compositions (nucleotide content, codon usage, amino acid usage) and derived features (coding potential, physicochemical property and phase separation) for 118 689 747 high-quality CDSs and 34 562 genomes across 24 995 species. Additionally, interactive analytical tools are provided to enable comparative analyses of sequence compositions and molecular features across different species and gene groups. Collectively, CompoDynamics bears the great potential to better understand the underlying roles of sequence composition dynamics across genes and genomes, providing a fundamental resource in support of a broad spectrum of biological studies.Entities:
Mesh:
Year: 2022 PMID: 34718745 PMCID: PMC8728180 DOI: 10.1093/nar/gkab979
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Database contents and organization. The present version of CompoDynamics provides six groups of sequence compositions (nucleotide content, codon usage and amino acid usage) and features (coding potential, protein physicochemical property and phase separation) for 118 689 747 CDSs and 34 562 genomes derived from RefSeq. These contents could be easily browsed, visualized, retrieved and analyzed at both genome and gene levels.
Figure 2.Codon usage dynamics across prokaryote and eukaryote genomes. (A) CUB distributions in prokaryote and eukaryote genomes. CUB (represented by ENC and CDC), GC content of genomic coding region, genome size, CDS number and metabolism type are visualized by different color palettes. For prokaryote, organisms with CDS count ≥100 are displayed in the cladogram. Several clades are highlighted to exemplify different kinds of strong CUBs. Burkhold*: Burkholderiales; Rick*: Rickettsiales. (B) Relationship between ENC and CDC for different GC values in prokaryote and eukaryote. (C) Codon usage and amino acid usage across six species categories in eukaryote.
Figure 3.Sequence composition and feature comparisons between genes with different GO terms. (A) Five groupings of yeast genes, according to GO terms, namely, ‘cytosolic large ribosomal subunit’, ‘cytosolic small ribosomal subunit’, ‘transmembrane transport’, ‘cell wall’ and ‘retrotransposon nucleocapsid’, are selected for comparison with the online tool GOComparator in CompoDynamics. The comparison results are illustrated for (B) nucleotide content, (C) codon usage bias, (D) amino acid usage, (E) positively/neutrally charged amino acids, (F) hydrophobicity and (G) intrinsically disordered regions.