| Literature DB >> 28409082 |
Shigeki Mitaku1, Ryusuke Sawada2.
Abstract
"Life" is a particular state of matter, and matter is composed of various molecules. The state corresponding to "life" is ultimately determined by the genome sequence, and this sequence determines the conditions necessary for survival of the organism. In order to elucidate one parameter characterizing the state of "life", we analyzed the amino acid sequences encoded in the total genomes of 557 prokaryotes and 40 eukaryotes using a membrane protein prediction online tool called SOSUI. SOSUI uses only the physical parameters of the encoded amino acid sequences to make its predictions. The ratio of membrane proteins in a genome predicted by the SOSUI online tool was around 23% for all genomes, indicating that this parameter is controlled by some mechanism in cells. In order to identify the property of genome DNA sequences that is the possible cause of the constant ratio of membrane proteins, we analyzed the nucleotide compositions at codon positions and observed the existence of systematic biases distinct from those expected based on random distribution. We hypothesize that the constant ratio of membrane proteins is the result of random mutations restricted by the systematic biases inherent to nucleotide codon composition. A new approach to the biological sciences based on the holistic analysis of whole genomes is discussed in order to elucidate the principles underlying "life" at the biological system level.Entities:
Keywords: SOSUI; genome sequence; membrane protein; prediction; principle
Year: 2016 PMID: 28409082 PMCID: PMC5221513 DOI: 10.2142/biophysico.13.0_305
Source DB: PubMed Journal: Biophys Physicobiol ISSN: 2189-4779
Figure 1Four layers and four processes that connect the layers in a biological system. A genome DNA sequence ultimately determines the phenotype of a biological organism. Amino acid sequences are synthesized according to the DNA sequence and are folded to make proteins, the functional units of biological systems. A biological organism is a biological system that can maintain itself and give birth to the next generation. The genome DNA sequence of the next generation is changed by mutations, but the genomic pool is formed by the accumulation of multiple mutations.
Figure 2Relationship between the one-dimensional sequences of DNA and the three-dimensional structures of biological systems. The nesting of sequences in one dimension corresponds to the hierarchical structure in three dimensions in biological organisms. Each class of the hierarchical structure is encoded in a sequence of corresponding size. How the information for a three-dimensional structure is encoded in a one-dimensional sequence is one of the most important outstanding problems in the biological sciences.
Figure 3The ratio of the number of membrane proteins in a genome sequence is plotted against the number of total open reading frames (ORFs) for 557 prokaryotes (black squares) and 40 eukaryotes (gray circles). The list of all the species is given in Supplementary Material. The average ratios for prokaryotes and eukaryotes are 0.228 and 0.240, with standard deviations of 0.029 and 0.036, respectively. The constant ratio of membrane proteins suggests that mutations in genome sequences are regulated by some mechanism in cells. The constant ratio of membrane proteins provides order in genome sequences throughout the biological kingdom and may represents one parameter defining the state of “life”.
Figure 4Distribution of nucleotide compositions at first (A), second (B), and third (C) positions of the triplet codon, and the average value across all positions (D), plotted against genomic GC content. The circle, diamond, rectangle, and triangle symbols indicate nucleotides A, G, C, and T, respectively. Closed and open symbols are used to indicate data from prokaryotic and eukaryotic genomes, respectively. For clarity, A and G were plotted on the upper panel, and C and T were plotted on the lower panel. Dotted lines indicate the nucleotide compositions of completely random sequences: dark dotted line for nucleotide G or T, light dotted line for nucleotide A or C.