| Literature DB >> 26641668 |
Xiaojuan Shen1,2,3, Tongcheng Huang1, Guanyu Wang3, Guanglin Li2.
Abstract
Internal symmetry is commonly observed in the majority of fundamental protein folds. Meanwhile, sufficient evidence suggests that nascent polypeptide chains of proteins have the potential to start the co-translational folding process and this process allows mRNA to contain additional information on protein structure. In this paper, we study the relationship between gene sequences and protein structures from the viewpoint of symmetry to explore how gene sequences code for structural symmetry in proteins. We found that, for a set of two-fold symmetric proteins from left-handed beta-helix fold, intragenic symmetry always exists in their corresponding gene sequences. Meanwhile, codon usage bias and local mRNA structure might be involved in modulating translation speed for the formation of structural symmetry: a major decrease of local codon usage bias in the middle of the codon sequence can be identified as a common feature; and major or consecutive decreases in local mRNA folding energy near the boundaries of the symmetric substructures can also be observed. The results suggest that gene duplication and fusion may be an evolutionarily conserved process for this protein fold. In addition, the usage of rare codons and the formation of higher order of secondary structure near the boundaries of symmetric substructures might have coevolved as conserved mechanisms to slow down translation elongation and to facilitate effective folding of symmetric substructures. These findings provide valuable insights into our understanding of the mechanisms of translation and its evolution, as well as the design of proteins via symmetric modules.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26641668 PMCID: PMC4671585 DOI: 10.1371/journal.pone.0144473
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Detailed information of the dataset.
| Superfamily | Protein name | Species | PDB ID | Protein segments | Gene segments | Source of genome data |
|---|---|---|---|---|---|---|
| Trimeric LpxA-like enzymes | UDP N-acetylglucosamine acyltransferase |
| 1LXA | 1–198 | 1–594 | Ftp |
| Trimeric LpxA-like enzymes | UDP N-acetylglucosamine acyltransferase |
| 1J2Z | 2–193 | 4–579 | Ftp |
| Trimeric LpxA-like enzymes | Galactoside acetyltransferase |
| 1KRR | 62–185 | 184–555 | Ftp |
| Trimeric LpxA-like enzymes | Maltose O-acetyltransferase |
| 1OCX | 55–183 | 163–549 | Ftp |
| Trimeric LpxA-like enzymes | Xenobiotic acetyltransferase |
| 1XAT | 3–166 | 7–498 | Ftp |
| Trimeric LpxA-like enzymes | Xenobiotic acetyltransferase |
| 1KK6 | 1–180 | 1–540 | Ftp |
| Trimeric LpxA-like enzymes | N-acetylglucosamine 1-phosphate uridyltransferase GlmU |
| 1HV9 | 252–438 | 754–1314 | Ftp |
| Trimeric LpxA-like enzymes | N-acetylglucosamine 1-phosphate uridyltransferase GlmU |
| 1G97 | 252–447 | 754–1341 | Ftp |
| Trimeric LpxA-like enzymes | Glucose-1-phosphate adenylyltransferase small subunit |
| 1YP2 | 390–521 | 1168–1563 | Cdtable |
| Trimeric LpxA-like enzymes | gamma-carbonic anhydrase |
| 1QRE | 4–174 | 10–522 | Ftp |
| Trimeric LpxA-like enzymes | Ferripyochelin binding protein |
| 1V3W | 1–144 | 1–432 | Ftp |
| Trimeric LpxA-like enzymes | Putative acetyltransferase |
| 1XHD | 1–144 | 1–426 | Ftp |
| Trimeric LpxA-like enzymes | Serine acetyltransferase |
| 1SSQ | 138–241 | 412–723 | Ftp |
| Trimeric LpxA-like enzymes | Hypothetical protein YdcK |
| 2F9C | 3–322 | 7–966 | Ftp |
| Trimeric LpxA-like enzymes | Acetyltransferase PglD |
| 3BSW | 72–195 | 214–588 | Ftp |
| An insect antifreeze protein | Thermal hysteresis protein |
| 1LOS | 3–90 | 7–270 | Cdtable |
| An insect antifreeze protein | Thermal hysteresis protein |
| 1M8N | 2–121 | 4–363 | Cdtable |
| Adhesin YadA, collagen-binding domain | Cell adhesion |
| 1P9H | 32–209 | 94–627 | Ftp |
Similarity (S) of the sequence alignment.
| S | 1LXA | 1J2Z | 1KRR | 1OCX | 1XAT | 1KK6 | 1HV9 | 1G97 | 1YP2 | 1QRE | 1V3W | 1XHD | 1SSQ | 2F9C | 3BSW | 1LOS | 1M8N | 1P9H |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1LXA | 1 | 0.46 | 0.4 | 0.36 | 0.27 | 0.21 | 0.22 | 0.23 | 0.32 | 0.27 | 0.33 | 0.33 | 0.36 | 0.38 | 0.39 | 0.44 | 0.32 | 0.25 |
| 1J2Z | 0.46 | 1 | 0.35 | 0.38 | 0.26 | 0.18 | 0.27 | 0.24 | 0.31 | 0.19 | 0.35 | 0.33 | 0.32 | 0.35 | 0.35 | 0.41 | 0.3 | 0.2 |
| 1KRR | 0.4 | 0.35 | 1 | 0.48 | 0.37 | 0.45 | 0.35 | 0.4 | 0.2 | 0.33 | 0.31 | 0.35 | 0.21 | 0.51 | 0.26 | 0.24 | 0.18 | 0.31 |
| 1OCX | 0.36 | 0.38 | 0.48 | 1 | 0.38 | 0.45 | 0.33 | 0.42 | 0.23 | 0.32 | 0.28 | 0.28 | 0.19 | 0.5 | 0.21 | 0.23 | 0.17 | 0.29 |
| 1XAT | 0.27 | 0.26 | 0.37 | 0.38 | 1 | 0.4 | 0.27 | 0.26 | 0.22 | 0.2 | 0.24 | 0.25 | 0.18 | 0.43 | 0.3 | 0.31 | 0.24 | 0.22 |
| 1KK6 | 0.21 | 0.18 | 0.45 | 0.45 | 0.4 | 1 | 0.21 | 0.19 | 0.3 | 0.15 | 0.34 | 0.27 | 0.31 | 0.36 | 0.3 | 0.4 | 0.27 | 0.16 |
| 1HV9 | 0.22 | 0.27 | 0.35 | 0.33 | 0.27 | 0.21 | 1 | 0.4 | 0.38 | 0.2 | 0.28 | 0.32 | 0.32 | 0.41 | 0.35 | 0.4 | 0.34 | 0.16 |
| 1G97 | 0.23 | 0.24 | 0.4 | 0.42 | 0.26 | 0.19 | 0.4 | 1 | 0.38 | 0.21 | 0.33 | 0.32 | 0.36 | 0.41 | 0.4 | 0.43 | 0.32 | 0.18 |
| 1YP2 | 0.32 | 0.31 | 0.2 | 0.23 | 0.22 | 0.3 | 0.38 | 0.38 | 1 | 0.27 | 0.2 | 0.21 | 0.23 | 0.45 | 0.19 | 0.28 | 0.12 | 0.26 |
| 1QRE | 0.27 | 0.19 | 0.33 | 0.32 | 0.2 | 0.15 | 0.2 | 0.21 | 0.27 | 1 | 0.34 | 0.3 | 0.35 | 0.43 | 0.31 | 0.4 | 0.27 | 0.18 |
| 1V3W | 0.33 | 0.35 | 0.31 | 0.28 | 0.24 | 0.34 | 0.28 | 0.33 | 0.2 | 0.34 | 1 | 0.4 | 0.21 | 0.48 | 0.34 | 0.34 | 0.21 | 0.26 |
| 1XHD | 0.33 | 0.33 | 0.35 | 0.28 | 0.25 | 0.27 | 0.32 | 0.32 | 0.21 | 0.3 | 0.4 | 1 | 0.31 | 0.49 | 0.19 | 0.32 | 0.21 | 0.23 |
| 1SSQ | 0.36 | 0.32 | 0.21 | 0.19 | 0.18 | 0.31 | 0.32 | 0.36 | 0.23 | 0.35 | 0.21 | 0.31 | 1 | 0.46 | 0.27 | 0.3 | 0.12 | 0.32 |
| 2F9C | 0.38 | 0.35 | 0.51 | 0.5 | 0.43 | 0.36 | 0.41 | 0.41 | 0.45 | 0.43 | 0.48 | 0.49 | 0.46 | 1 | 0.54 | 0.52 | 0.49 | 0.38 |
| 3BSW | 0.39 | 0.35 | 0.26 | 0.21 | 0.3 | 0.3 | 0.35 | 0.4 | 0.19 | 0.31 | 0.34 | 0.19 | 0.27 | 0.54 | 1 | 0.29 | 0.22 | 0.31 |
| 1LOS | 0.44 | 0.41 | 0.24 | 0.23 | 0.31 | 0.4 | 0.4 | 0.43 | 0.28 | 0.4 | 0.34 | 0.32 | 0.3 | 0.52 | 0.29 | 1 | 0.68 | 0.38 |
| 1M8N | 0.32 | 0.3 | 0.18 | 0.17 | 0.24 | 0.27 | 0.34 | 0.32 | 0.12 | 0.27 | 0.21 | 0.21 | 0.12 | 0.49 | 0.22 | 0.68 | 1 | 0.3 |
| 1P9H | 0.25 | 0.2 | 0.31 | 0.29 | 0.22 | 0.16 | 0.16 | 0.18 | 0.26 | 0.18 | 0.26 | 0.23 | 0.32 | 0.38 | 0.31 | 0.38 | 0.3 | 1 |
Results and parameters.
| PDB id | Nt sym |
| Cdbias sym |
| rCAI |
|---|---|---|---|---|---|
| 1LXA | 2 | 0.35 | 2 | 0.01 | 1.1702 |
| 1J2Z | 2 | 0.35 | 2 | 0.02 | 1.0726 |
| 1KRR | 2 | 0.35 | 2 | 0.01 | 0.9798 |
| 1OCX | 2 | 0.35 | 0 | 0.01 | 1.0404 |
| 1XAT | 2 | 0.35 | 2 | 0.015 | 1.4127 |
| 1KK6 | 2 | 0.35 | 2 | 0.025 | 1.1597 |
| 1HV9 | 2 | 0.35 | 2 | 0.008 | 1.2223 |
| 1G97 | 2 | 0.35 | 2 | 0.01 | 1.1626 |
| 1YP2 | 2 | 0.30 | 2 | 0.025 | 1.0605 |
| 1QRE | 2 | 0.35 | 2 | 0.005 | 1.0621 |
| 1V3W | 2 | 0.35 | 2 | 0.03 | 1.0368 |
| 1XHD | 2 | 0.35 | 2 | 0.03 | 1.1514 |
| 1SSQ | 2 | 0.35 | 0 | 0.015 | 1.1794 |
| 2F9C | 2 | 0.30 | 2 | 0.005 | 1.0452 |
| 3BSW | 2 | 0.35 | 2 | 0.015 | 1.3315 |
| 1LOS | 2 | 0.30 | weak | 0.025 | 1.0293 |
| 1M8N | 2 | 0.30 | weak | 0.020 | 1.0348 |
| 1P9H | 2 | 0.35 | 2 | 0.005 | 1.0474 |
Fig 1Results obtained from protein 1LXA.
(A) The gene sequence. (B) The tertiary protein structure. (C) The recurrence plot of the nucleotide sequence. (D) The recurrence plot of the codon sequence.
Fig 2Recurrence plots for 8 proteins.
The first and third horizontal panel: the recurrence plot of the nucleotide sequence; the second and the fourth horizontal panel: the recurrence plot for codon usage bias in the codon sequence of the corresponding proteins. The PDB id of the protein is given in each of the plot.
Fig 3Results obtained from protein 1YP2.
(A). The cartoon structure: the two symmetric substructures are shown in red and green; the extended irregular structure in the middle of the helix is shown in blue. (B). The profile of local codon usage bias: the decreased region in the middle of the codon sequence is shown with dashed square lines. (C). The comparison between the profile of the natural gene sequence and the averaged profile of the same codon sequence randomized by 10 times. The blue line is for natural gene sequence and the red line is for the average of the random sequences. (D). The profile of local folding free energy, and the dashed square lines shows the region with decreased local folding free energy. (F). The comparison between the natural gene sequence and the random sequences. The blue line is for natural gene sequence and the red line is for the average of the random sequences.
Fig 4Results obtained from protein 1P9H.
(A). The cartoon structure, with the two symmetric substructures shown in red and green respectively. (B). The profile of local codon usage bias: the region with major decreases in CAI value is shown with dashed square lines. (C). The comparison between the profile of the natural gene sequence and the averaged profile of the same codon sequence randomized by 10 times. The blue line is for natural gene sequence and the red line is for the average of the random sequences. (D) and (E). The profile of local folding free energy and the comparison between the natural gene sequence and the random sequences, respectively. The blue line is for natural gene sequence and the red line is for the average of the random sequences.
Fig 5The profile for local codon usage bias and the comparison of natural gene sequences to the random sequences.
The x-axis is the codon sequence and the y-axis is the local CAI value. The major decrease of local codon usage bias is shown with dashed square lines. The blue line is for natural gene sequence and the red line is for the average of the random sequences.
Fig 6The profile for local mRNA folding free energy and the comparison of natural gene sequences to the random sequences.
The x-axis is the nucleotide sequence and the y-axis is the local folding free energy. The region with major decrease of local mRNA folding free energy is shown with dashed square lines. The blue line is for natural gene sequence and the red line is for the average of the random sequences.