| Literature DB >> 29419741 |
Sergio Branciamore1, Grigoriy Gogoshin2, Massimo Di Giulio3, Andrei S Rodin4.
Abstract
The identity/recognition of tRNAs, in the context of aminoacyl tRNA synthetases (and other molecules), is a complex phenomenon that has major implications ranging from the origins and evolution of translation machinery and genetic code to the evolution and speciation of tRNAs themselves to human mitochondrial diseases to artificial genetic code engineering. Deciphering it via laboratory experiments, however, is difficult and necessarily time- and resource-consuming. In this study, we propose a mathematically rigorous two-pronged in silico approach to identifying and classifying tRNA positions important for tRNA identity/recognition, rooted in machine learning and information-theoretic methodology. We apply Bayesian Network modeling to elucidate the structure of intra-tRNA-molecule relationships, and distribution divergence analysis to identify meaningful inter-molecule differences between various tRNA subclasses. We illustrate the complementary application of these two approaches using tRNA examples across the three domains of life, and identify and discuss important (informative) positions therein. In summary, we deliver to the tRNA research community a novel, comprehensive methodology for identifying the specific elements of interest in various tRNA molecules, which can be followed up by the corresponding experimental work and/or high-resolution position-specific statistical analyses.Entities:
Keywords: bayesian networks; distribution divergence; information theory; operational code; tRNA identity; tRNA recognition
Year: 2018 PMID: 29419741 PMCID: PMC5871937 DOI: 10.3390/life8010005
Source DB: PubMed Journal: Life (Basel) ISSN: 2075-1729
Figure 1tRNA sequence alignment. The first row corresponds to the “standard” numbering scheme. The second row is the consensus sequence. Structural parts of tRNA molecules are highlighted in color (see text for details).
Figure 2Bayesian network built from the full set of tRNA sequences. (a) direct visualization of the PDAG (Probabilistic Directed Acyclic Graph); (b) same, superimposed on the secondary tRNA structure. Nodes in the network correspond to the variables (specifically, tRNA positions, as enumerated in Figure 1, first row), and edges to the dependencies between the variables. “Boldness” of the edge is proportional to the dependency strength, also indicated by the number shown next to the edge. See text for BN construction details. (Label “100” does not refer to a tRNA position but rather is a placeholder for the cognate aa variable, appearing here for the technical convenience reasons only; directionality of the edge (arrow) is for mathematical convenience reasons only as well, and does not imply causation.)
Figure 3Position “importance” profile for Gly tRNAs, shown for three life domains: Archaea (a,d), Bacteria (b,e) and Eukarya (c,f). Relative Entropy is shown as function of tRNA position (a–c) (enumerated as in Figure 1, first row), or visualized as color intensity superimposed over the secondary tRNA structure (d–f). Significance cutoff limit is shown as a red line in (a–c)—see text for discussion.
Figure 4Summary visualization of the tRNA position “importance”, for all aa tRNA subclasses, shown for three life domains: Archaea (a); Bacteria (b) and Eukarya (c). Higher values correspond to “hotter” colors. tRNA position numbering is as in Figure 3. This is a summary visualization of the detailed plots presented in Supplemental Figures S1–S22.
Determinant tRNA positions classified by the amino acid and the three domains of life (Archaea, Arc; Bacteria, Bac; Eukarya, Euk). Only positions with the Relative Entropy (RE) value higher than the preset cutoff value (set at RE approx. 0.2, as illustrated in Figure 3 and Supplemental Figures S1–S22) are shown. Note that the preset cutoff value is higher than that suggested by the “random noise” sequence (see text). For compatibility with the literature, the position numbering system follows that of [23,24]. It corresponds to the first row in Figure 1. Paired bases in the secondary structure of tRNA are shown as connected by a dash; for example, the 3-70 pair indicates that in the secondary structure of the tRNA molecule, the bases at position 3 and 70 should be paired [23,24]. Similarly, “=” indicates tertiary interactions among tRNA nucleotides. Question marks indicates the uncertain positions, that is to say positions that might not be important for the tRNA molecule. “Confirmed” indicates that the nucleotides identified as “important”, or identity determinants, in the experimental analyses [2,25] were also significant (high RE values) in our analysis. “Low significance” means that the “experimentally proven to be important/identity determinants” nucleotides were not associated with high RE values. On the contrary, “Suggested” positions exhibit high RE values but so far have not been registered in the experimental analysis literature.
| Amino Acid | Confirmed | Low Significance | Suggested | Domain |
|---|---|---|---|---|
| 2-71; 3-70; 4-69; 9; 12-23; 13-22 | Arc | |||
| 20a; 30-40; 35; 36; 44; 47 | ||||
| Ala | 2-71; 3-70; 4-69; 20; 64; 73 | 17; 20a; 29-41; 35; 36; 44; 51-63 | Bac | |
| 3-70 | 2-71; 4-69; 5-68; 9; 12-23; 13-22; 20a; 27-43 | Euk | ||
| 29-41; 31-39; 32; 35; 36; 38; 59 | ||||
| 4-69; 20; 35; 36; 73 | Arc | |||
| Arg | 20; 20A; | 38; 73 | 4-69; 5-68; 20a; 35; 36 | Bac |
| 4-69; 15; 20; 35; 36; 48; 71 | Euk | |||
| 2-71; 3-70; 9; 11-24; 12-23; 17; 17a | Arc | |||
| 20a; 22; 31-39; 34; 35; 36; 37 | ||||
| 46; 47; 51-63; 59; 73 | ||||
| Asn | 73 | 1-72; 2-71; 3-70; 12-23; 31-39 | Bac | |
| 32; 34; 35; 36; 51-63 | ||||
| 2-71; 4-69; 13-22; 17; 27-43; 29-41 | Euk | |||
| 31-39; 34; 35; 36; 38; 49-65 | ||||
| 50-64; 51-63; 59; 73 | ||||
| 2-71; 3-70; 6-67; 11-24; 12-23; 13-22; 17a | Arc | |||
| 20b; 20a; 20; 25; 28-42; 34; 35 | ||||
| 36; 44; 46; 47; 49-65; 64; 73 | ||||
| Asp | 25; 38; 73 | 2-71; 10 | 11-24; 20a; 31-39; 34; 35; 36 | Bac |
| 43; 44; 50-64; 51-63; 65 | ||||
| 9-12-23; 25; 38; 73 | 10 | 1-72; 11-24; 13-22; 20a; 26; 28-42 | Euk | |
| 29-41; 31-39; 34; 35; 36; 46 | ||||
| 47; 49-65; 50-64; 59; 63; 71 | ||||
| 3-70; 4-69; 5-68; 12-23; 13-22; 17 | Arc | |||
| 17a; 20; 21; 24; 27-43; 34 | ||||
| 35; 36; 45; 46; 47; 73 | ||||
| Cys | 2-71; 3-70; 13-22; 46; 73 | 15; 48 | 12-23; 17; 29-41; 34; 35; 36 | Bac |
| 43; 45; 47; 51-63; 71 | ||||
| 12-23; 73 | 20a | 2-71; 3-70; 6-67; 7-66; 9; 13-22; | Euk | |
| 26; 29-41; 31-39; 34; 35; 36; 37 | ||||
| 38; 51-63; 59; 68; 69 | ||||
| 1-72; 2-71; 3-70; 4-69; 11-24; 12-23; 13-22 | Arc | |||
| 17; 17a; 20b; 20a; 25; 34; 35 | ||||
| 36; 37; 44; 46; 47; 73 | ||||
| Gln | 1-72; 38; 73 | 2-71; 3-70; 10-25; 37 | 12-23; 13-22; 20a; 34; 35; 36 | Bac |
| 44; 45; 46; 51-63; 65 | ||||
| 2-71; 3-70; 6-67; 7-66; 11-24; 12-23 | Euk | |||
| 13-22; 26; 29-41; 31-39; 34; 35 | ||||
| 36; 44; 46; 47; 52-62; 73 | ||||
| 2-71; 3-70; 11-24; 12-23; 13-22; 17a; 20a; 20b | Arc | |||
| 25; 34; 35; 36; 46; 47; 49-65 | ||||
| Glu | 11-24; 13; 46; 47; 71 | 1-72; 22; 33; 37 | 3-70; 4-69; 5-68; 7-66; 9; 12-23 | Bac |
| 17; 20a; 30-40; 34; 35; 36 | ||||
| 38; 45; 49-65; 51-63 | ||||
| 1-72; 2-71; 3-70; 5-68; 11-24; 12-23 | Euk | |||
| 13-22; 25; 26; 31-39; 34; 35 | ||||
| 36; 38; 47; 59 | ||||
| 2-71; 3-70; 11-24; 13-22 | Arc | |||
| 31-39; 35; 36; 49-65 | ||||
| Gly | 2-71; 3-70; 73 | 1-72; 10-25 | 29-41; 31-39; 35; 36; 63 | Bac |
| 2-71; 3-70 | 73 | 11-24; 25; 31-39; 35; 36; 47; 59 | Euk | |
| -1; 2-71; 3-70; 5-68; 11-24; 12-23 | Arc | |||
| 34; 35; 36; 37; 50-64; 73 | ||||
| His | -1; 73 | 2-71; 3-70; 4-69; 6-67; 31-39 | Bac | |
| 32; 34; 35; 36; 38; 63 | ||||
| -1; 73 | 2-71; 9; 11-24; 12-23; 13-22; 26 | Euk | ||
| 30-40; 31-39; 32; 34; 35; 36 | ||||
| 37; 38; 44; 45; 46; 47 | ||||
| 2-71; 3-70; 11-24; 12-23; 29-41; 31-39 | Arc | |||
| 34; 35; 36; 37; 46; 73 | ||||
| Ile | 12-23; 29-41 | 4-69; 24; 37; 38; 73 | 3-70; 6-67; 13-22; 20a; 27-43; 28-42 | Bac |
| 34; 35; 36; 44; 51-63 | ||||
| 4-69; 17; 20a; 28-42; 29-41 | Euk | |||
| 30-40; 34; 35; 36; 60 | ||||
| 1-72; 2-71; 9; 11-24; 17; 17a; 20 | Arc | |||
| 20b; 27-43; 31-39; 34; 35; 36; 37 | ||||
| 47; 51-63; 57; 64; 73 | ||||
| Ini | 2-71; 3-70; 32 | 33; 37 | 1-72; 5-68; 6-67; 11-24; 12-23; 17a | Bac |
| 26; 27-43; 29-41; 31-39; 34; 35 | ||||
| 36; 44; 57; 59; 73 | ||||
| 1-72; 2-71; 3-70; 4-69; 5-68; 6-67; 7-66 | Euk | |||
| 12-23; 20; 20a; 22; 27-43; 29-41; 31-39 | ||||
| 33; 34; 35; 36; 38; 46; 51-63 | ||||
| 54; 59; 60; 73 | ||||
| 2-71; 3-70; 4-69; 5-68; 9; e11 | Arc | |||
| 12-23; 13-22; 20a; 20b; e21; 31-39 | ||||
| 35; 36; 37; 44; ; 46 | ||||
| Leu | 20A; 73 | 20; 38; | 2-71; e5; 9; e11; 12-23; 13-22; 15; 20a; 21 | Bac |
| e21; 35; 36; 44; ; 46; 47; 48; 73 | ||||
| 4-69; e5; e11; 12-23; 13-22; 20a | Euk | |||
| 20b; e21; 29-41; 35; 36; 37 | ||||
| 44; ; 45; 47; 49-65; 68 | ||||
| 2-71; 3-70; 4-69; 9; 11-24; 12-23; 22 | Arc | |||
| 31-39; 34; 35; 36; 37; 46; 73 | ||||
| Lys | 73? | 4-69; 5-68; 7-66; 12-23; 20a; 26 | Bac | |
| 31-39; 34; 35; 36; 73 | ||||
| 2-71; 7-66; 9; 12-23; 13-22; 17; 20a; 29-41 | Euk | |||
| 31-39; 34; 35; 36; 44; 59; 70 | ||||
| 31-39; 34; 35; 36; 37; 73 | Arc | |||
| Met | 73 | 4-69; 5-68; 38 | 31-39; 34; 35; 36; 71 | Bac |
| 20 | 73 | 1-72; 12-23; 31-39; 34; 35; 36; 60; 64 | Euk | |
| 9; 12-23; 13-22; 20; 20a; 34; 35 | Arc | |||
| 36; 37; 45; 46; 47; 73 | ||||
| Phe | 27-43; 31-39; 44; 45; 59 | 20?; 28-42; 30-40; 37; 39?; 43?; 60; 73? | 3-70; 12-23; 17; 20a; 34; 35 | Bac |
| 36; 39; 43; 51-63; 73 | ||||
| 20; 31-39; 37 | 73? | 2-71; 4-69; 5-68; 6-67; 9; 12-23 | Euk | |
| 13-22; 17; 20a; 29-41; 34; 35 | ||||
| 36; 51-63; 59; 60; 73 | ||||
| 2-71; 3-70; 6-67; 11-24; 12-23; 13-22 | Arc | |||
| 17a; 25; 35; 36; 37; 46 | ||||
| Pro | 72; 73 | 15; 48 | 1-72; 2-71; 3-70; 17a; 35 | Bac |
| 36; 37; 44; 59 | ||||
| 2-71; 11-24; 12-23; 13-22; 20a; 25 | Euk | |||
| 26; 27-43; 29-41; 31-39; 32; 35 | ||||
| 36; 37; 38; 49-65; 73 | ||||
| 3-70; 4-69; 7-66; 9; 10-25; e11; 11-24; 12-23 | Arc | |||
| 13-22; 14; 15; 16; 17a; 17; 20b; 20 | ||||
| 20a; 21; e21; 27-43; 28-42; 31-39; 34; 35 | ||||
| 36; 37; 44; ; 48; 49-65; 50-64; 59 | ||||
| 67; 68; 72; 73 | ||||
| Sec | 2-71; 3-70; 4-69; 5-68; e5; 7-66 | 1-72; e2; e3; e4; 6-67; 12-23; e12; e13 | 14; 15; 16; 20a; 29-41; 31-39; 34; 35 | Bac |
| 8; 9; 10-25; 11-24; e11; e17 | 13-22; 20; e22; e23 | 36; ; 59; 63; 64; 66; 68 | ||
| e21; e27; 45; 48; 73; e24; e25 | ||||
| e26; 50-64; 64?; 66?; 68? | ||||
| 2-71; 4-69; 5-68; 9; 10-25; e11; 14; 20a | Euk | |||
| e21; 21; 23; 26; 27-43; 28-42; 29-41; 31-39 | ||||
| 34; 35; 36; 38; 44; ; 46; 47 | ||||
| 48; 49-65; 50-64; 51-63; 59; 66; 67; 73 | ||||
| 2-71; 4-69; 5-68; e11; 12-23; e21; 22; 24 | Arc | |||
| 35; 36; 44; ; 46; 47; 73 | ||||
| Ser | 2-71; 3-70; e4-69;11; e21; 44; 73 | 11-24 | 5-68; 12-23; 13-22; 20a; 20b; 35 | Bac |
| e4; e5?;e12; e13; e14; e15; e16; e22 | 36; ; 46; 47; 51-63; 59 | |||
| e23; e24; e25; e26; 69? e2; e3; | ||||
| e11; e21; e2; e3; e4; e12; e13; e22; e23 | 4-69; 13-22; 20a; 23; 27-43; 35; 36; 44 | Euk | ||
| ; 46; 47; 49-65; 51-63; 59; 73 | ||||
| 2-71; 3-70; 11-24; 12-23; 35 | Arc | |||
| 36; 37; 46; 73 | ||||
| Thr | 2-71; 3-70 | 1-72; 73 | 20a; 35; 36 | Bac |
| 1-72 | 31-39; 35; 36; 73 | Euk | ||
| 2-71; 3-70; 6-67; 22; 27-43 | Arc | |||
| 31-39; 34; 35; 36; 50-64 | ||||
| Trp | 1-72; 3-70; 73 | 2-71; 5-68; 9 | 15; 20a; 29-41; 31-39; 34; 35; 36; 48 | Bac |
| 2-71; 15; 20a; 31-39; 34; 35 | Euk | |||
| 36; 43; 48; 52-62; 65 | ||||
| 1-72; 4-69; 9; 12-23; 13-22; 31-39; 34 | Arc | |||
| 35; 36; 37; 46; 47; 51-63; 73 | ||||
| Tyr | 73 | e5; 6-67; 10-25; e11; 12-23; 13-22; 17; 20 | Bac | |
| 20a; 20b; e21; 27-43; 28-42; 31-39; 34; 35 | ||||
| 36; 44; ; 46; 59; 71 | ||||
| 1-72 | 73 | 12-23; 17; 27-43; 28-42; 31-39 | Euk | |
| 34; 35; 36; 51-63; 70 | ||||
| 2-71; 3-70; 4-69; 5-68; 6-67; 11-24; 12-23 | Arc | |||
| 20a; 30-40; 31-39; 35; 36; 47 | ||||
| Val | 73 | 3-70; 4-69 | 13-22; 35; 36 | Bac |
| 3-70; 11-24; 12-23; 13-22; 27-43 | Euk | |||
| 31-39; 35; 36; 38; 60 |