| Literature DB >> 26773059 |
Wei-Hua Chen1, Vera van Noort1, Maria Lluch-Senar2, Marco L Hennrich1, Judith A H Wodke3, Eva Yus2, Andreu Alibés4, Guglielmo Roma4, Daniel R Mende1, Christina Pesavento1, Athanasios Typas1, Anne-Claude Gavin5, Luis Serrano6, Peer Bork7.
Abstract
We developed a comprehensive resource for the genome-reduced bacterium Mycoplasma pneumoniae comprising 1748 consistently generated '-omics' data sets, and used it to quantify the power of antisense non-coding RNAs (ncRNAs), lysine acetylation, and protein phosphorylation in predicting protein abundance (11%, 24% and 8%, respectively). These factors taken together are four times more predictive of the proteome abundance than of mRNA abundance. In bacteria, post-translational modifications (PTMs) and ncRNA transcription were both found to increase with decreasing genomic GC-content and genome size. Thus, the evolutionary forces constraining genome size and GC-content modify the relative contributions of the different regulatory layers to proteome homeostasis, and impact more genomic and genetic features than previously appreciated. Indeed, these scaling principles will enable us to develop more informed approaches when engineering minimal synthetic genomes.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26773059 PMCID: PMC4756857 DOI: 10.1093/nar/gkw004
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.A comprehensive resource for M. pneumoniae and comparisons with selected model bacteria. (A) -omics data and the number of data sets collected (see Supplementary Table S1 for more details); (B) Comparison of data coverage with the selected model bacteria (as of May 2013). The numbers for each selected model bacteria come from the individual study where the largest quantities of data sets were included; if two or more studies contain the same number of ‘-omics’ data sets, the one published most recently was chosen (Supplementary Table S2). (C) A graphical view of all available data sets for M. pneumoniae, where each circular layer represents an -omics datatype. Only data for the plus strand are shown; see Supplementary Figure S1 for a full-sized figure containing all data for both strands.
Figure 2.Factors controlling protein abundance. (A) Correlations of individual factors with mRNA and protein (partial) abundances. Levels of significance: *** < 0.001, ** < 0.01, * < 0.05. Percentages higher than 10% in the last column are highlighted in red. (B) Combined contributions of the factors listed above in (A) to protein abundance variation using MARS (Multivariate adaptive regression splines) analysis. (C) A schematic view of the information flow from genome to RNA to protein and the additional regulatory layers. The widths of the dark-blue arrows correspond to the relative contributions to protein abundances as compared with mRNA abundances.
Figure 3.The percentage of post-translationally modifiable residuals (PTMRs) decreases with decreasing GC-content. Colored dots: selected model bacterial species; black dots: the other 1600 bacterial species. (A) The proportion of putative acetylation targets (lysine - K) in a genome decrease with an increasing genomic GC-content. (B) Proportion of putative phosphorylation targets as a function of genomic GC-content; shown is the major phosphorylation target serine – S.
Figure 4.Dissecting the relative predictive powers of genome size and GC-content on selected genomic and genetic features that have been used to derive scaling laws. (A) Most of the selected features correlate with both genome size and GC-content in a similar way using regular Pearson Correlation; (B) Genome size and GC-content correlate significantly across 1600 bacteria. The dashed red-line represents the linear regression. (C) Separating the impact of one factor from the other using partial correlation. Genome size was found to have more predictive power than GC-content for some features (in dark-green), while for others, GC-content was found to be more predictive (in dark-red). A factor (i.e. genome size or genome GC) is defined as a major contributor if it has significantly more predictive power than the other. For this to be true one of the following conditions must be satisfied: (i) it (genome size or GC-content) correlates significantly with a genomic feature while the other factor (GC or genome size) does not, or (ii) both correlate significantly with a genomic feature, but one (GC or genome size) has an absolute correlation coefficient value that is twice as high or higher than the other (genome size or GC). (D) Distances of the features in (C) to the diagonal line showing the relative predictive power (absolute partial correlation coefficient value) of GC-content over genome size; the more to the left on the x-axis, the more predictive power of GC-content over genome size. Black data points in (C) and (D) are those for which the GC-content and genome size have similar predictive powers. See Supplementary Table S13 for the data.