| Literature DB >> 25105358 |
Chong Peng1, Feng Gao2.
Abstract
Essential genes, those critical for the survival of an organism under certain conditions, play a significant role in pharmaceutics and synthetic biology. Knowledge of protein localization is invaluable for understanding their function as well as the interaction of different proteins. However, systematical examination of essential genes from the aspect of the localizations of proteins they encode has not been explored before. Here, a comprehensive protein localization analysis of essential genes in 27 prokaryotes including 24 bacteria, 2 mycoplasmas and 1 archaeon has been performed. Both statistical analysis of localization information in these genomes and GO (Gene Ontology) terms enriched in the essential genes show that proteins encoded by essential genes are enriched in internal location sites, while exist in cell envelope with a lower proportion compared with non-essential ones. Meanwhile, there are few essential proteins in the external subcellular location sites such as flagellum and fimbrium, and proteins encoded by non-essential genes tend to have diverse localizations. These results would provide further insights into the understanding of fundamental functions needed to support a cellular life and improve gene essentiality prediction by taking the protein localization and enriched GO terms into consideration.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25105358 PMCID: PMC4126397 DOI: 10.1038/srep06001
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
The information of the organisms used in the current study
| Organism | Group | RefSeq | Dataset of essential gene | Dataset of non-essential gene | Source |
|---|---|---|---|---|---|
| Bacteria (−) | NC_005966 | 499, 90.38%, 2.81% | 2594, 76.21%, 2.08% | I | |
| Bacteria (+) | NC_000964 | 271, 94.83%, 1.48% | 3955, 81.85%, 1.29% | II | |
| Bacteria (−) | NC_004663 | 325, 84.62%, 5.23% | 4453, 65.10%, 5.19% | II | |
| Bacteria (−) | NC_007650 | 42, 84.73%, 4.93% | 2314, 70.65%, 3.23% | II | |
| Bacteria (−) | NC_007651 | 364, 84.73%, 4.93% | 2912, 70.65%, 3.23% | II | |
| Bacteria (−) | NC_002163 | 228, 76.75%, 3.07% | 1395, 80.43%, 2.94% | II | |
| Bacteria (−) | NC_011916 | 480, 83.96%, 4.79% | 3224, 63.31%, 3.97% | I | |
| Bacteria (−) | NC_000913 | 609, 80.13%, 2.79% | 2923, 82.18%, 2.39% | I | |
| Bacteria (−) | NC_008601 | 392, 87.76%, 3.83% | 1329, 76.98%, 3.01% | II | |
| Bacteria (−) | NC_000907 | 642, 87.07%, 1.25% | 512, 87.11%, 1.95% | I | |
| Bacteria (−) | NC_000915 | 323, 76.47%, 2.48% | 1135, 76.83%, 4.58% | I | |
| Archaeon | NC_005791 | 519, 92.49%, 0.39% | 1077, 86.72%, 0.09% | I | |
| Bacteria (−) | NC_000962 | 687, 85.01%, 3.78% | 3070, 61.34%, 2.31% | I | |
| Mycoplasmas | NC_000908 | 381, 78.48%, 0.52% | 94, 67.02%, 0.00% | I | |
| Mycoplasmas | NC_002771 | 310, 80.00%, 0.65% | 322, 60.25%, 0.62% | I | |
| Bacteria (−) | NC_010729 | 463, 87.26%, 1.94% | 1627, 62.81%, 3.01% | II | |
| Bacteria (−) | NC_002516 | 117, 85.47%, 1.71% | 5454, 76.88%, 3.04% | II | |
| Bacteria (−) | NC_008463 | 335, 79.10%, 1.79% | 960, 65.83%, 1.98% | I | |
| Bacteria (−) | NC_004631 | 358, 90.78%, 2.79% | 3906, 73.35%, 2.18% | I | |
| Bacteria (−) | NC_016810 | 353, 88.67%, 2.55% | 4035, 74.65%, 2.55% | I | |
| Bacteria (−) | NC_016856 | 105, 90.48%, 1.90% | 5210, 63.88%, 2.13% | II | |
| Bacteria (−) | NC_003197 | 230, 87.83%, 3.91% | 4228, 74.83%, 2.46% | II | |
| Bacteria (−) | NC_004347 | 402, 88.81%, 3.48% | 1103, 86.49%, 3.45% | I | |
| Bacteria (−) | NC_009511 | 535, 76.26%, 2.80% | 4315, 72.28%, 4.47% | II | |
| Bacteria (+) | NC_002745 | 302, 95.03%, 0.66% | 2281, 80.45%, 1.18% | II | |
| Bacteria (+) | NC_007795 | 351, 90.03%, 1.99% | 2541, 76.62%, 0.87% | II | |
| Bacteria (+) | NC_009009 | 218, 94.50%, 0.46% | 2052, 82.36%, 1.17% | I | |
| Bacteria (−) | NC_002505 | 565, 56.87%, 1.41% | 2105, 78.05%, 2.89% | II | |
| Bacteria (−) | NC_002506 | 214, 56.87%, 1.41% | 838, 78.05%, 2.89% | II |
aBacteria (+),Gram-positive bacteria; Bacteria (−), Gram-negative bacteria.
bThe dataset description usually contain three numbers: X, Y%, Z%. X is the number of essential (or non-essential) genes of the organism. Y% is the prediction coverage of essential (or non-essential) genes. Z% is the percentage of proteins which may have multiple localization sites among essential (or non-essential) genes.
cSource of the non-essential genes. I, the non-essential genes are obtained based on the original literatures. II, the non-essential genes are the complementary set of essential genes.
Figure 1The plot of statistically significant GO terms in the category of cellular component incorporating the phylogenetic information.
Every GO term with p value less than 0.05 in over two organisms according to the results of Fisher's exact tests is listed in the vertical axis. If the GO term is over-represented in the organism listed in the horizontal axis, the cell at the crossing of the row and column is red. Blue boxes represent that the GO term is under-represented in the organism of the column. If the GO term is not statistically significant in the organism, the box is white. The lines at the top of the figure are the phylogenetic tree of the organisms used in the current study.
Figure 2(a) Percentages of proteins located in cytoplasm, cytoplasm membrane and extracellular for essential and non-essential genes in the 27 genomes. (b) Average percentages of proteins located in periplasm, outer membrane and cell wall for essential and non-essential genes in the related genomes.
Figure 3Distribution of essential proteins (the inner ring of the doughnut chart) and non-essential proteins (the outer ring of the doughnut chart) in (a) Bacillus subtilis 168, (b) Escherichia coli MG1655 and (c) Mycoplasma genitalium G37.