| Literature DB >> 36233299 |
Aleksandra E Korenskaia1,2,3, Yury G Matushkin2,3, Sergey A Lashin1,2,3, Alexandra I Klimenko1,2.
Abstract
Protein abundance is crucial for the majority of genetically regulated cell functions to act properly in prokaryotic organisms. Therefore, developing bioinformatic methods for assessing the efficiency of different stages of gene expression is of great importance for predicting the actual protein abundance. One of these steps is the evaluation of translation elongation efficiency based on mRNA sequence features, such as codon usage bias and mRNA secondary structure properties. In this study, we have evaluated correlation coefficients between experimentally measured protein abundance and predicted elongation efficiency characteristics for 26 prokaryotes, including non-model organisms, belonging to diverse taxonomic groups The algorithm for assessing elongation efficiency takes into account not only codon bias, but also number and energy of secondary structures in mRNA if those demonstrate an impact on predicted elongation efficiency of the ribosomal protein genes. The results show that, for a number of organisms, secondary structures are a better predictor of protein abundance than codon usage bias. The bioinformatic analysis has revealed several factors associated with the value of the correlation coefficient. The first factor is the elongation efficiency optimization type-the organisms whose genomes are optimized for codon usage only have significantly higher correlation coefficients. The second factor is taxonomical identity-bacteria that belong to the class Bacilli tend to have higher correlation coefficients among the analyzed set. The third is growth rate, which is shown to be higher for the organisms with higher correlation coefficients between protein abundance and predicted translation elongation efficiency. The obtained results can be useful for further improvement of methods for protein abundance prediction.Entities:
Keywords: protein abundance prediction; translation elongation efficiency; translation in prokaryotes
Mesh:
Substances:
Year: 2022 PMID: 36233299 PMCID: PMC9570070 DOI: 10.3390/ijms231911996
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 6.208
Values of the analyzed parameters for the studied organisms: elongation efficiency type (EEI type), which was obtained by EloE (see the details in Materials and Methods section); coverage of proteomic data; Spearman correlation coefficients between protein abundance and base EEI index; corresponding p-value; minimal doubling time (see the references in Table 4); mean (M) and standard deviation (R) values of ranks of ribosomal protein genes measured on the base EEI scale.
| Organism | Coverage | Correlation Coefficient | Doubling Time (h) | M_Main | R_Main | ||
|---|---|---|---|---|---|---|---|
|
| 1 | 62.5 | 0.66 | 8.46 × 10−211 | 0.4 | 83 | 25 |
|
| 1 | 39.4 | 0.65 | 4.01 × 10−202 | 0.5 | 94 | 12 |
|
| 1 | 75.9 | 0.63 | 3.60 × 10−141 | 0.6667 | 91 | 26 |
|
| 1 | 57.1 | 0.60 | 2.58 × 10−128 | 0.5 | 76 | 49 |
|
| 1 | 15.9 | 0.57 | 2.52 × 10−67 | 2.7 | 91 | 20 |
|
| 1 | 16.4 | 0.57 | 1.46 × 10−41 | 0.5167 | 79 | 36 |
|
| 1 | 97.40 | 0.57 | 0 | 0.3333 | 87 | 30 |
|
| 4 | 26.20 | 0.52 | 3.42 × 10−102 | 0.5 | 77 | 46 |
|
| 2 | 47.40 | 0.46 | 6.48 × 10−42 | 2.4667 | 67 | 37 |
|
| 1 | 56.3 | 0.45 | 1.79 × 10−126 | 0.5 | 85 | 39 |
|
| 1 | 62.2 | 0.44 | 3.71 × 10−66 | 4.5 | 77 | 32 |
|
| 4 | 25.2 | 0.42 | 2.50 × 10−05 | 3.3 | 66 | 43 |
|
| 1 | 37.8 | 0.40 | 8.01 × 10−48 | 5.8 | 53 | 51 |
|
| 1 | 29.6 | 0.40 | 4.86 × 10−47 | 1 | 91 | 26 |
|
| 4 | 27.1 | 0.39 | 3.87 × 10−37 | 2.48 | 79 | 18 |
|
| 1 | 38.5 | 0.38 | 6.58 × 10−48 | 2.6 | 87 | 28 |
|
| 4 | 85.7 | 0.35 | 1.21 × 10−41 | 3 | 61 | 41 |
|
| 2 | 66.2 | 0.35 | 6.06 × 10−66 | 8.2 | 59 | 42 |
|
| 4 | 54.2 | 0.33 | 2.00 × 10−02 | 11 | 36 | 41 |
|
| 2 | 98.8 | 0.28 | 1.22 × 10−29 | 0.8333 | 51 | 44 |
|
| 3 | 43.6 | 0.27 | 1.37 × 10−42 | 0.5 | 83 | 17 |
|
| 4 | 84 | 0.26 | 3.44 × 10−28 | 14.7 | 36 | 63 |
|
| 4 | 79.00 | 0.24 | 1.60 × 10−06 | 46 | 55 | 46 |
|
| 2 | 60.9 | 0.14 | 1.14 × 10−83 | 8 | 34 | 54 |
|
| 2 | 41.9 | 0.12 | 1.73 × 10−05 | 5 | 42 | 47 |
Figure 1Spearman’s correlation coefficients between protein abundance and EEI for 25 prokaryotes.
Figure 2Spearman’s correlation coefficients between protein abundance and EEI distribution among 5 EEI types for 25 prokaryotes (Neisseria meningitidis, the organism with p-value > 0.05, is excluded).
Figure 3The distributions of the Spearman’s correlation coefficient values between protein abundance and EEI for the organisms belonged to the different elongation efficiency optimization types that take into account: number of secondary structures (EEI2, panel (a)), energy of secondary structures (EEI3, panel (b)), codon bias and number of secondary structures (EEI4, panel (c)). All these distributions are compared with the correlation between protein abundance and indices for codon bias-based index (EEI1) calculated for the same organisms.
The table represents Pearson’s correlations between measured protein abundance and elongation efficiency indices calculated by EloE and corresponding p-values. These correlation coefficients have been evaluated using several EEI values: (a) EEI values which have been calculated for the base EEI type of the studied organism are presented in the “r base” column with the p-value in the “Pval base” column, and the base EEI type is presented in the “EEI type” column and represents which of the five EEI types is determined as the type of the organism under study; (b) EEI values which were calculated for the each EEI type (1–5), the correlation coefficients and the p-values presented in the “r” and “Pval” columns for each type, respectively.
| Organism | r Base | Pval Base | r1 | Pval 1 | r2 | Pval 2 | r3 | Pval 3 | r4 | Pval 4 | r5 | Pval 5 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| 1 | 0.57 | 2.52 × 10−67 | 0.57 | 2.52 × 10−67 | 0 | 9.07 × 10−1 | 0.08 | 2.95 × 10−2 | 0.45 | 2.49 × 10−39 | 0.21 | 2.58 × 10−9 |
|
| 1 | 0.38 | 6.58 × 10−48 | 0.38 | 6.58 × 10−48 | 0.19 | 5.05 × 10−12 | 0.16 | 4.75 × 10−9 | 0.42 | 2.77 × 10−59 | 0.37 | 1.88 × 10−43 |
|
| 1 | 0.57 | 0 | 0.57 | 0 | 0 | 7.71 × 10−1 | −0.04 | 4.66 × 10−3 | 0.5 | 1.37 × 10−252 | 0.38 | 6.89 × 10−142 |
|
| 1 | 0.6 | 2.58 × 10−128 | 0.6 | 2.58 × 10−128 | 0.29 | 1.37 × 10−26 | −0.16 | 6.93 × 10−9 | 0.6 | 4.20 × 10−127 | 0.04 | 1.55 × 10−1 |
|
| 1 | 0.57 | 1.46 × 10−41 | 0.57 | 1.46 × 10−41 | 0.1 | 3.24 × 10−2 | −0.06 | 2.02 × 10−1 | 0.53 | 3.84 × 10−35 | 0.23 | 6.85 × 10−7 |
|
| 1 | 0.45 | 1.79 × 10−126 | 0.45 | 1.79 × 10−126 | 0.16 | 2.70 × 10−15 | 0.17 | 1.10 × 10−18 | 0.46 | 8.48 × 10−133 | 0.44 | 4.92 × 10−123 |
|
| 1 | 0.65 | 4.01 × 10−202 | 0.65 | 4.01 × 10−202 | 0.05 | 4.94 × 10−2 | 0.24 | 5.17 × 10−24 | 0.62 | 8.03 × 10−178 | 0.6 | 1.71 × 10−162 |
|
| 1 | 0.66 | 8.46 × 10−211 | 0.66 | 8.46 × 10−211 | 0.22 | 7.89 × 10−20 | −0.16 | 4.63 × 10−11 | 0.59 | 2.74 × 10−159 | 0.05 | 3.43 × 10−2 |
|
| 1 | 0.63 | 3.60 × 10−141 | 0.63 | 3.60 × 10−141 | 0.08 | 6.81 × 10−3 | −0.12 | 2.22 × 10−05 | 0.59 | 3.28 × 10−121 | 0.2 | 2.36 × 10−13 |
|
| 1 | 0.4 | 8.01 × 10−48 | 0.4 | 8.01 × 10−48 | 0.09 | 1.02 × 10−3 | −0.03 | 2.71 × 10−1 | 0.3 | 2.11 × 10−26 | 0.11 | 2.02 × 10−4 |
|
| 1 | 0.44 | 3.71 × 10−66 | 0.44 | 3.71 × 10−66 | −0.02 | 4.18 × 10−1 | −0.16 | 3.25 × 10−9 | 0.37 | 6.11 × 10−44 | 0.11 | 6.38 × 10−5 |
|
| 1 | 0.4 | 4.86 × 10−47 | 0.4 | 4.86 × 10−47 | 0.01 | 7.23 × 10−1 | 0.06 | 4.22 × 10−2 | 0.36 | 6.63 × 10−39 | 0.31 | 2.53 × 10−27 |
|
| 2 | 0.12 | 1.73 × 10−5 | 0.22 | 6.07 × 10−16 | 0.12 | 1.73 × 10−5 | 0.15 | 7.05 × 10−8 | 0.35 | 7.86 × 10−39 | 0.34 | 2.38 × 10−36 |
|
| 2 | 0.46 | 6.48 × 10−42 | −0.15 | 3.25 × 10−5 | 0.46 | 6.48 × 10−42 | −0.21 | 8.67 × 10−9 | 0.34 | 5.60 × 10−22 | −0.24 | 1.93 × 10−11 |
|
| 2 | 0.28 | 1.22 × 10−29 | 0.07 | 5.65 × 10−3 | 0.28 | 1.22 × 10−29 | −0.16 | 3.60 × 10−10 | 0.39 | 1.06 × 10−57 | −0.12 | 4.31 × 10−6 |
|
| 2 | 0.35 | 6.06 × 10−66 | −0.28 | 2.65 × 10−40 | 0.35 | 6.06 × 10−66 | −0.16 | 6.61 × 10−15 | 0.46 | 8.46 × 10−117 | −0.25 | 2.45 × 10−34 |
|
| 2 | 0.24 | 1.60 × 10−6 | 0.06 | 2.14 × 10−1 | 0.24 | 1.60 × 10−6 | 0.01 | 8.88 × 10−1 | 0.26 | 3.25 × 10−7 | 0.03 | 6.04 × 10−1 |
|
| 3 | 0.27 | 1.37 × 10−42 | 0.09 | 1.76 × 10−5 | 0.25 | 2.09 × 10−36 | 0.27 | 1.37 × 10−42 | 0.28 | 1.24 × 10−44 | 0.3 | 6.09 × 10−53 |
|
| 4 | 0.52 | 3.42 × 10−102 | 0.53 | 3.42 × 10−102 | 0.1 | 1.86 × 10−4 | −0.17 | 1.57 × 10−10 | 0.52 | 1.01 × 10−94 | −0.1 | 1.39 × 10−4 |
|
| 4 | 0.35 | 1.21 × 10−41 | 0.37 | 1.21 × 10−41 | 0.09 | 1.91 × 10−3 | −0.06 | 4.79 × 10−2 | 0.35 | 1.15 × 10−38 | 0.13 | 2.43 × 10−6 |
|
| 4 | 0.39 | 3.87 × 10−37 | 0.4 | 3.87 × 10−37 | 0.13 | 5.22 × 10−5 | 0.18 | 3.23 × 10−8 | 0.39 | 5.07 × 10−36 | 0.36 | 8.71 × 10−31 |
|
| 4 | 0.33 | 2.00 × 10−2 | 0.08 | 2.00 × 10−2 | 0.17 | 1.00 × 10−5 | 0.15 | 1.00 × 10−7 | 0.33 | 1.00 × 10−9 | 0.28 | 1.00 × 10−22 |
|
| 4 | 0.42 | 2.50 × 10−5 | 0.15 | 2.50 × 10−5 | 0.17 | 4.34 × 10−6 | −0.03 | 3.57 × 10−1 | 0.42 | 2.72 × 10−32 | 0 | 9.27 × 10−1 |
|
| 4 | 0.14 | 1.14 × 10−83 | −0.27 | 1.14 × 10−83 | 0.17 | 5.59 × 10−35 | −0.17 | 3.95 × 10−32 | 0.14 | 1.43 × 10−22 | −0.29 | 1.43 × 10−96 |
|
| 4 | 0.26 | 3.44 × 10−28 | 0.19 | 3.44 × 10−28 | 0.05 | 1.48 × 10−3 | 0.05 | 4.60 × 10−3 | 0.26 | 1.45 × 10−52 | 0.21 | 3.94 × 10−36 |
|
| 5 | 0.09 | 7.73 × 10−2 | 0.08 | 8.93 × 10−2 | 0.07 | 1.41 × 10−1 | 0.03 | 5.30 × 10−1 | 0.04 | 3.88 × 10−1 | 0.09 | 7.73 × 10−2 |
Figure 4The distribution of the analyzed 25 organisms by taxonomical categories. Arrows point to corresponding higher-order taxa of the analyzed species. A number near a species name corresponds to the base EEI type and a colored square shows a Spearman’s correlation (corr(PA|EEI)) coefficient value (see the legend).
Figure 5(a) Dependence of corr(PA|EEI) coefficient from the logarithm of minimal doubling time for 25 organisms. The color describes EEI type (see the legend). The trend line is colored purple; (b) distribution of bacteria with diverse growth rates (with the minimal doubling time <2 h, ≥2 h and <5 h, and ≥5 h for the fast, medium, and slow growing bacteria, respectively) by the M parameter.
Figure 6Dependence between corr(PA|EEI) and M (mean ribosome protein-coding gene rank) and R (standard deviation of ribosome protein-coding genes’ rank) parameters.
The list of species under study (species for which proteomic data were collected) and corresponding assembly accessions.
| № | Species | Assembly Accession |
|---|---|---|
| 1 |
| GCF_000021485.1 |
| 2 |
| GCF_000008165.1 |
| 3 |
| GCF_000011065.1 |
| 4 |
| GCF_000046705.1 |
| 5 |
| GCF_000009085.1 |
| 6 |
| GCF_000020685.1 |
| 7 |
| GCF_000195755.1 |
| 8 |
| GCF_000005845.2 |
| 9 |
| GCF_000006805.1 |
| 10 |
| GCF_000008525.1 |
| 11 |
| GCF_000006865.1 |
| 12 |
| GCF_000008485.1 |
| 13 |
| GCF_000007685.1 |
| 14 |
| GCF_000196035.1 |
| 15 |
| GCF_000010625.1 |
| 16 |
| GCF_000195955.2 |
| 17 |
| GCF_001272835.1 |
| 18 |
| GCF_000008805.1 |
| 19 |
| GCF_000006765.1 |
| 20 |
| GCF_000006945.2 |
| 21 |
| GCF_000006925.2 |
| 22 |
| GCF_000009665.1 |
| 23 |
| GCF_000006785.2 |
| 24 |
| GCF_000009725.1 |
| 25 |
| GCF_000022365.1 |
| 26 |
| GCF_000009065.1 |
The description of elongation efficiency types (EEI types) calculation.
| Type | Codon Usage | Local Complementarity Level (Potential mRNA Secondary Structures, | Local Complementarity Level with the Energy of Potential mRNA Secondary Structures |
|---|---|---|---|
| + | — | — | |
| — | + | — | |
| — | — | + | |
| + | + | — | |
| + | — | + |
The table reflects minimal doubling time for each species in hours and in a logarithmic form. The article from which the data were taken is also presented for each of the organisms (column DT_source).
| Organism | Doubling_Time (DT), h | Log (DT) | DT_Source |
|---|---|---|---|
|
| 5 | 0.69897 | [ |
|
| 0.5 | −0.30103 | [ |
|
| 2.7 | 0.431364 | [ |
|
| 3 | 0.477121 | [ |
|
| 2.466667 | 0.39211 | [ |
|
| 2.6 | 0.414973 | [ |
|
| 2.48 | 0.394452 | [ |
|
| 0.333333 | −0.47712 | [ |
|
| 11 | 1.041393 | [ |
|
| 0.833333 | −0.07918 | [ |
|
| 0.5 | −0.30103 | [ |
|
| 3.3 | 0.518514 | [ |
|
| 8.2 | 0.913814 | [ |
|
| 0.516667 | −0.28679 | [ |
|
| 46 | 1.662758 | [ |
|
| 14.7 | 1.167317 | [ |
|
| 8 | 0.90309 | [ |
|
| 0.5 | −0.30103 | [ |
|
| 0.5 | −0.30103 | [ |
|
| 0.5 | −0.30103 | [ |
|
| 0.4 | −0.39794 | [ |
|
| 0.666667 | −0.17609 | [ |
|
| 5.8 | 0.763428 | [ |
|
| 4.5 | 0.653213 | [ |
|
| 1 | 0 | [ |
The table represents M (the normalized mean rank of ribosomal genes) and R (the normalized standard deviation for ranks of ribosomal genes) calculated for each EEI type (1–5), the EEI type, and M and R values for the EEI type (the columns M_main, R_main).
| Organism | R1 | R2 | R3 | R4 | R5 | R_Main | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| 2 | 2 | 54 | 42 | 47 | 36 | 48 | 34 | 51 | 42 | 47 | 42 | 47 |
|
| 4 | 76 | 48 | 38 | 46 | −43 | 55 | 77 | 46 | −48 | 51 | 77 | 46 |
|
| 1 | 91 | 20 | −14 | 58 | 17 | 60 | 71 | 38 | 50 | 41 | 91 | 20 |
|
| 4 | 54 | 42 | 32 | 53 | −5 | 57 | 61 | 41 | 16 | 60 | 61 | 41 |
|
| 2 | −43 | 54 | 67 | 37 | −25 | 57 | 34 | 67 | −49 | 42 | 67 | 37 |
|
| 1 | 85 | 31 | 30 | 53 | 19 | 56 | 84 | 22 | 64 | 49 | 87 | 28 |
|
| 4 | 61 | 34 | 42 | 40 | 38 | 47 | 79 | 19 | 70 | 38 | 79 | 18 |
|
| 1 | 87 | 30 | 13 | 63 | 14 | 53 | 82 | 34 | 75 | 36 | 87 | 30 |
|
| 4 | −6 | 36 | 14 | 41 | 5 | 42 | 36 | 41 | 29 | 43 | 36 | 41 |
|
| 2 | −33 | 55 | 51 | 44 | −10 | 63 | 32 | 52 | −24 | 59 | 51 | 44 |
|
| 1 | 76 | 49 | 46 | 50 | −37 | 62 | 75 | 52 | −27 | 68 | 76 | 49 |
|
| 4 | 16 | 61 | 27 | 61 | −11 | 59 | 66 | 43 | −10 | 57 | 66 | 43 |
|
| 2 | −61 | 45 | 59 | 42 | −38 | 53 | 49 | 52 | −61 | 34 | 59 | 42 |
|
| 1 | 79 | 36 | 28 | 59 | −23 | 62 | 77 | 38 | 13 | 61 | 79 | 36 |
|
| 4 | −42 | 39 | 44 | 48 | −46 | 45 | 55 | 46 | −53 | 33 | 55 | 46 |
|
| 4 | −7 | 60 | 18 | 56 | 8 | 61 | 36 | 63 | 29 | 69 | 36 | 63 |
|
| 2 | −11 | 55 | 34 | 54 | −28 | 58 | 29 | 60 | −30 | 52 | 34 | 54 |
|
| 3 | −75 | 27 | 83 | 18 | 83 | 17 | 25 | 43 | 44 | 34 | 83 | 17 |
|
| 1 | 84 | 40 | 24 | 63 | 30 | 47 | 82 | 42 | 82 | 38 | 85 | 39 |
|
| 1 | 95 | 6 | 6 | 64 | 7 | 53 | 87 | 19 | 79 | 35 | 94 | 12 |
|
| 1 | 83 | 25 | 48 | 47 | −32 | 62 | 81 | 29 | −18 | 63 | 83 | 25 |
|
| 1 | 91 | 26 | −4 | 62 | −25 | 59 | 88 | 27 | 26 | 71 | 91 | 26 |
|
| 1 | 53 | 51 | 19 | 60 | −30 | 58 | 40 | 51 | −13 | 65 | 53 | 51 |
|
| 1 | 77 | 32 | 0 | 65 | −37 | 54 | 63 | 47 | 3 | 67 | 77 | 32 |
|
| 1 | 91 | 26 | −1 | 65 | −4 | 59 | 82 | 34 | 60 | 56 | 91 | 26 |