| Literature DB >> 22735706 |
Xizeng Mao1, Han Zhang, Yanbin Yin, Ying Xu.
Abstract
The majority of bacterial genes are located on the leading strand, and the percentage of such genes has a large variation across different bacteria. Although some explanations have been proposed, these are at most partial explanations as they cover only small percentages of the genes and do not even consider the ones biased toward the lagging strand. We have carried out a computational study on 725 bacterial genomes, aiming to elucidate other factors that may have influenced the strand location of genes in a bacterium. Our analyses suggest that (i) genes of some functional categories such as ribosome have higher preferences to be on the leading strands; (ii) genes of some functional categories such as transcription factor have higher preferences on the lagging strands; (iii) there is a balancing force that tends to keep genes from all moving to the leading and more efficient strand and (iv) the percentage of leading-strand genes in an bacterium can be accurately explained based on the numbers of genes in the functional categories outlined in (i) and (ii), genome size and gene density, indicating that these numbers implicitly contain the information about the percentage of genes on the leading versus lagging strand in a genome.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22735706 PMCID: PMC3458553 DOI: 10.1093/nar/gks605
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.General characteristics for leading-strand genes. (A) Distribution of the number of bacteria with a specific percentage of genes on the leading strands; and (B) distribution of the percentages of leading-strand genes versus cell growth rate in the 104 bacterial genomes with growth rate data available.
Preference of GOslim categories toward leading strands across 725 bacterial genomes
| GO branch | GO category | |
|---|---|---|
| MF | GO:0005198 structural molecule activity | 7.28E-156 |
| MF | GO:0003723 RNA binding | 1.15E-107 |
| MF | GO:0008135 translation factor activity | 4.67E-78 |
| MF | GO:0005515 protein binding | 1.37E-32 |
| MF | GO:0003774 motor activity | 2.05E-24 |
| MF | GO:0000166 nucleotide binding | 5.93E-14 |
| MF | GO:0003676 nucleic acid binding | 1.04E-13 |
| MF | GO:0030234 enzyme regulator activity | 1.97E-08 |
| MF | GO:0016740 transferase activity | 9.62E-05 |
| BP | GO:0006412 translation | 6.77E-118 |
| BP | GO:0007049 cell cycle | 8.66E-72 |
| BP | GO:0019538 protein metabolic process | 7.64E-56 |
| BP | GO:0015031 protein transport | 7.60E-34 |
| BP | GO:0016043 cellular component organization | 1.81E-30 |
| BP | GO:0009605 response to external stimulus | 1.89E-20 |
| BP | GO:0007154 cell communication | 2.51E-18 |
| BP | GO:0006091 generation of precursor metabolites and energy | 1.11E-17 |
| BP | GO:0005975 carbohydrate metabolic process | 7.17E-16 |
| BP | GO:0019748 secondary metabolic process | 3.10E-15 |
| BP | GO:0006629 lipid metabolic process | 1.37E-11 |
| BP | GO:0009056 catabolic process | 1.51E-06 |
| BP | GO:0006519 cellular amino acid and derivative metabolic process | 2.41E-06 |
| BP | GO:0006811 ion transport | 5.37E-05 |
| BP | GO:0006950 response to stress | 3.26E-03 |
| CC | GO:0005840 ribosome | 1.97E-158 |
| CC | GO:0043226 organelle | 5.17E-116 |
| CC | GO:0005737 cytoplasm | 6.29E-56 |
| CC | GO:0005622 intracellular | 6.67E-41 |
| CC | GO:0005694 chromosome | 1.11E-26 |
| CC | GO:0043234 protein complex | 6.98E-26 |
| CC | GO:0005618 cell wall | 5.31E-11 |
| CC | GO:0005886 plasma membrane | 1.42E-06 |
The first column represents the three major GO categories: molecular function (MF), cellular component (CC) and biological process (BP).
Preference of GOslim categories toward lagging strands across 725 bacterial genomes
| GO branch | GO category | Preference |
|---|---|---|
| MF | GO:0003700 sequence specific DNA binding transcription factor activity | 3.09E-34 |
| MF | GO:0016209 antioxidant activity | 3.31E-11 |
| MF | GO:0003677 DNA binding | 3.41E-11 |
| MF | GO:0004871 signal transducer activity | 2.10E-06 |
| MF | GO:0004672 protein kinase activity | 9.34E-06 |
| MF | GO:0008233 peptidase activity | 1.08E-05 |
| BP | GO:0050789 regulation of biological process | 3.20E-15 |
| BP | GO:0019725 cellular homeostasis | 3.08E-11 |
| BP | GO:0006350 transcription | 2.94E-09 |
| BP | GO:0006464 protein modification process | 1.06E-06 |
| BP | GO:0007165 signal transduction | 6.31E-05 |
The first column represents the three major GO categories: molecular function (MF), cellular component (CC) and biological process (BP).
Figure 2.Boxplots of the percentage of coding region versus the percentage of leading strand genes in a genome. (A) For all bacteria (P value of the Wilcoxon test: 1.1 × 10−8); (B) bacteria of specialized type with P value 0.22; (C) bacteria of host-associated type with P value 0.54; (D) bacteria of aquatic type with P value 0.065; (E) bacteria of terrestrial type with P value 0.0031 and (F) bacteria of multiple type with P value 1.9 × 10−9.
Figure 3.Performance in predicting the percentage of leading-strand genes in a genome by our trained neural network on the training, validation and testing set, respectively.
Figure 4.Evaluation of performance in the percentage of leading-strand genes in a genome with smaller numbers of inputs by our trained neural network.
Twenty-five selected inputs used in the neural network model
| Category | Variable | MIV |
|---|---|---|
| BP | GO:0007049 cell cycle | 0.012344 |
| BP | GO:0006811 ion transport | 0.005643 |
| BP | GO:0006629 lipid metabolic process | −0.00275 |
| BP | GO:0019748 secondary metabolic process | −0.00294 |
| BP | GO:0006810 transport | −0.00332 |
| BP | GO:0006950 response to stress | −0.00388 |
| BP | GO:0006139 nucleobase | −0.00402 |
| BP | GO:0019725 cellular homeostasis | −0.00466 |
| BP | GO:0006412 translation | −0.00825 |
| BP | GO:0006091 generation of precursor metabolites and energy | −0.01027 |
| CC | GO:0005840 ribosome | 0.009582 |
| CC | GO:0005737 cytoplasm | 0.007178 |
| CC | GO:0005622 intracellular | 0.003747 |
| CC | GO:0043226 organelle | 0.002657 |
| CC | GO:0030312 external encapsulating structure | −0.00234 |
| CC | GO:0030313 cell envelope | −0.00319 |
| CC | GO:0043234 protein complex | −0.0049 |
| MF | GO:0003723 RNA binding | 0.027956 |
| MF | GO:0009055 electron carrier activity | 0.003654 |
| MF | GO:0016301 kinase activity | 0.002421 |
| MF | GO:0030234 enzyme regulator activity | −0.00182 |
| MF | GO:0008135 translation factor activity | −0.00304 |
| MF | GO:0005198 structural molecule activity | −0.01745 |
| OT | Gene density | −0.00451 |
| OT | Genome size | −0.00515 |
Biological process (BP), cellular component (CC) and molecular function (MF) in the first column are the top-level categories in the gene ontology (GO) hierarchy; OT is for other variables that are not GO categories.