Literature DB >> 23724305

Glucan Biosynthesis Protein G Is a Suitable Reference Gene in Escherichia coli K-12.

Sean S J Heng1, Oliver Y W Chan, Bryan M H Keng, Maurice H T Ling.   

Abstract

The expressions of reference genes used in gene expression studies are assumed to be stable under most circumstances. However, a number of studies had demonstrated that such genes were found to vary under experimental conditions. In addition, genes that are stably expressed in an organ may not be stably expressed in other organs or other organisms, suggesting the need to identify reference genes for each organ and organism. This study aims at identifying stably expressed genes in Escherichia coli. Microarray datasets from E. coli substrain MG1655 and 1 dataset from W3110 were analysed. Coefficient of variance (COV) of was calculated and 10% of the lowest COV from 4631 genes common in the 3 MG1655 sets were analysed using NormFinder. Glucan biosynthesis protein G (mdoG), which is involved in cell wall synthesis, displayed the lowest weighted COV and weighted NormFinder Stability Index for the MG1655 datasets, while also showing to be the most stable in the dataset for substrain W3110, suggesting that mdoG is a suitable reference gene for E. coli K-12. Gene ontology over-representation analysis on the 39 genes suggested an over-representation of cell division, carbohydrate metabolism, and protein synthesis which supports the short generation time of E. coli.

Entities:  

Year:  2011        PMID: 23724305      PMCID: PMC3658596          DOI: 10.5402/2011/469053

Source DB:  PubMed          Journal:  ISRN Microbiol


1. Introduction

Gene expression analysis is examining the variations in gene expression as a result of changes in environmental conditions by measuring DNA expression levels over time. Quantitative real-time polymerase chain reaction (qRT-PCR) is a commonly used technique to quantify gene expressions [1]. However, several parameters need to be controlled in this process in order to obtain accurate and reliable results. These include variations in the amounts of starting material between samples, RNA extraction efficiency, RNA integrity/quality, efficiency of cDNA synthesis, and differences in the overall transcriptional activity of the cells analyzed. Of which, only the differences in transcriptional activity is of interest. A possible method for accounting other effects is relative normalization, which is the correction of the raw expression values with a reference gene. The reference gene acts as an invariant endogenous control which implies that reference genes should be stably expressed under a wide variety of conditions [2]. However, several studies had suggested that it is not easy to find universal reference genes [3-5]. This corroborates several studies demonstrating that several genes originally considered invariable in terms of expression may vary under different experimental conditions [6-8]. For an accurate comparison of DNA expression in different samples, it is necessary to use verified reference genes, such as GAPDH (glyceraldehyde-3-phosphate dehydrogenase) [9] or UBQ (ubiquinone) [9], for normalisation or determine new ones for each experimental system with varying external stimuli [3, 10]. However, some studies had also demonstrated that the expression of GAPDH [11] and UBQ [12] is varying in some conditions. Other studies had also identified references genes, such as recA (recombinase A), proC (pyrroline-5-carboxylase reductase) and gyrA (DNA gyrase) in Pectobacterium atrosepticum [8], and map (methionine aminopeptidase), rpoC (RNA polymerase, beta prime subunit), and alaS (alanyl-tRNA synthetase) in Acidithiobacillus ferrooxidans [13]. This suggests that established reference genes for a particular organism may not be suitable for other organisms. Escherichia coli, a Gram-negative bacterium commonly found in the gastrointestinal tract, was selected as it has a genome of approximately 4,000 genes. In addition, the genetic material in its plasmids is easily manipulated. Furthermore, E. coli is easily cultured and is a commonly studied prokaryotic model [14, 15]. As it is easily cultured in the laboratory environment and is of low pathogenicity [16-18]. Candidate reference genes, which are commonly believed to be invariant, can be identified using algorithms such as geNorm [19], NormFinder [20], and BestKeeper [21]. These methods require a wide range of accessible gene expression data, normally obtained through DNA profiling such as quantitative PCR. However, microarrays, which usually contain thousands of probes, present a good source of data for identifying reference genes [22]. A recent study had successfully identified MARK3 as a suitable reference gene in mouse liver using microarray analysis [23]. Currently, there are numerous studies being conducted to validate known reference genes and possibly identify new ones [3, 8, 9, 13, 24]. In this study, we identify and evaluate a set of invariant genes in E. coli K-12 substrains MG1655 and W3110. Our results suggest that glucan biosynthesis protein G (mdoG) is a suitable reference gene for both MG1655 and W3110 strains of E. coli.

2. Materials and Methods

2.1. Microarray Data

Four datasets were obtained from publicly available microarray databases, Gene Expression Omnibus, National Centre for Biotechnology Information, of which 3 were from E. coli K-12 substrain MG1655 and 1 from substrain W3110. Briefly, the studies conducted with the datasets are as follows: GDS680: MG1655 grown in either aerobic or anaerobic conditions, deleted for transcriptional regulators in oxygen response, and used to validate a computational model of transcriptional and metabolic networks. GDS1099: aerobically grown MG1655 cells in several media with varied carbon sources including glucose, glycerol, succinate, L-alanine, acetate, and L-proline. GDS1494: analysis of derivatives of strain 1655: wild type, fur mutant, and wild type with added FeSO4, induced to overexpress RyhB, a noncoding RNA regulated by the fur repressor protein. GDS1827: W3110 cells grown aerobically and exposed to low, neutral, or high pH to study acid and base response.

2.2. Finding Invariant Genes

The coefficient of variation (COV) of every gene was calculated as the quotient of standard deviation and arithmetic mean. From 4631 genes, the top 10% with the lowest COV from each dataset were listed. The intersection between the 3 MG1655 data sets (GDS680, GDS1099, and GDS1494) was identified and analysed using NormFinder version 0.953 [20] to rank the stability of these genes. A weighted stability index for each gene was then calculated from the NormFinder's stability index, and an average of the NormFinder stability indexes multiplied by number of samples was taken.

2.3. Gene Ontology Overrepresentation Analysis

The list of genes from the intersection of the top 10% with the lowest COV from the 3 MG1655 data sets were analysed for gene ontology overrepresentation using the Gene ontology gene annotation file for E. coli dated July 8, 2011. Chi-square test was carried out to identify the overrepresented gene ontology terms in the list of genes using the overall P value of 0.01, corrected for multiple testing using Holm-Bonferroni method [25].

2.4. Comparing NormFinder and COV

Spearman's correlation was used to determine the correlation between stability index generated by NormFinder and COV values using the equation r = 1 − [6∑d 2/(n(n 2 − 1))], where r is the Spearman's correlation, d is the difference in the rank of two parameters, and n is the sample size. The t-statistic was calculated by equation , which was used to test for the null hypothesis of no correlation with (n − 2) degrees of freedom.

3. Results and Discussion

A threshold of less than 10% COV was used to select stably expressed genes across the three datasets GDS 680, 1099, and 1494 (MG1655). A total of 39 genes of consistent low variance were found (Table 1) with the weighted COV values ranging from 0.099 to 0.138. Glucan biosynthesis protein G (mdoG) was found to be most stable with both the lowest weighted COV value and weighted NormFinder Stability Index for MG1655. In GDS 1827 (W3110), mdoG was the most stable in the dataset, with a COV of 0.088 and NormFinder Stability Index of 0.078. The highest COV in GDS 1827 is 0.791 for hslV (peptidase component of the HslUV protease). Our results suggest that mdoG may be a suitable reference gene across both E. coli strains W3110 and MG1655. This may imply that mdoG may be suitable for use as reference genes in other strains of E. coli K-12.
Table 1

Weighted mean COV and NormFinder stability index of 39 invariant genes across 3 datasets (MG1655).

Gene symbolGene nameWeighted COV valuesWeighted NormFinder Stability Index
mdoG Glucan biosynthesis protein G0.0990.082
dapA Dihydrodipicolinate synthase0.1060.090
crp DNA-binding transcriptional dual regulator0.1060.102
hslV Peptidase component of the HslUV protease0.1110.105
mrdB Cell wall shape-determining protein0.1010.114
fucU L-Fucose mutarotase0.1070.117
yjgP LPS transport (lptF) 0.1050.117
yigC 3-Octaprenyl-4-hydroxybenzoate decarboxylase0.1260.117
sun 16S rRNA m(5)C967 methyltransferase, S-adenosyl-L-methionine-dependent0.1300.119
gor Glutathione oxidoreductase0.1170.126
hflB ATP-dependent metalloprotease0.1270.130
yqiB Predicted dehydrogenase0.1250.134
murG N-Acetylglucosaminyl transferase0.1240.134
yrbG Predicted calcium/sodium:proton antiporter0.1220.134
yejK Nucleotide associated protein0.1200.141
yfgA Cytoskeletal protein required for MreB assembly0.1180.142
hflX Putative GTPase HflX 0.1050.142
spoT Bifunctional (p)ppGpp synthetase II/guanosine-3′,5′-bis pyrophosphate 3′-pyrophosphohydrolase0.1170.143
holC DNA polymerase III, chi subunit0.1340.144
xerD Site-specific tyrosine recombinase0.1140.146
tolB Periplasmic protein0.1150.146
yheS Fused predicted transporter subunits of ABC superfamily: ATP-binding components0.1100.146
ntpA Dihydroneopterin triphosphate pyrophosphatase0.1180.147
yabB Conserved protein, MraZ family0.1150.148
lolA Chaperone for lipoproteins0.1170.153
yggD Predicted DNA-binding transcriptional regulator0.1160.153
pnp Polynucleotide phosphorylase/polyadenylase0.1100.155
yrbB ABC transporter maintaining OM lipid asymmetry, cytoplasmic STAS component0.1230.156
rnc RNase III0.1170.157
xerC Site-specific tyrosine recombinase0.1380.160
rfaF ADP-heptose:LPS heptosyltransferase II0.1200.161
yigP Conserved protein, SCP2 family0.1220.164
gyrB DNA gyrase, subunit B0.1260.164
nagC DNA-binding transcriptional dual regulator, repressor of N-acetylglucosamine0.1320.165
nrdR Conserved protein0.1180.168
hemD Uroporphyrinogen III synthase0.1080.169
pheT Phenylalanine tRNA synthetase, beta subunit0.1240.171
frr Ribosome recycling factor0.1290.173
cls Cardiolipin synthase 10.1290.181
Gene ontology overrepresentation is a commonly used mechanistic analysis method to provide biological insights into a list of genes [26-28]. The analysis of the 39 genes with consistently low variance for gene ontology overrepresentation showed that 3 primary functions were found to be overrepresented (Table 2). They were cell division, carbohydrate metabolic process, and protein synthesis. As E. coli is generally accepted as a rapidly dividing prokaryote [29], it is plausible to expect genes responsible for cell division to be constantly expressed. As the cells grow, new cellular structures, such as cell wall and other enzymes, need to be synthesized. Hence, it is plausible to expect protein synthesis to be stable throughout the cell cycle. The role of glutathione [30, 31] and tetrapyrrole [32] had been implicated in protein synthesis while diaminopimelate had been shown to have a role in the maintenance of cell wall [33]. At the same time, cell division involves the replication and segregation of genetic material [34]. Carbohydrate is both a primary source of energy for E. coli [35] as well as the primary component of bacterial cell wall [36]. The gene mdoG has been shown to be involved in the formation of the β-1,6 glucose linkage [37] and in the periplasmic release of newly synthesized osmoregulated periplasmic glucans [38, 39], which is needed for bacterial cell wall. Thus, it is plausible that the expression of mdoG is needed during binary fission. As E. coli divides rapidly, constant synthesis of cell wall is needed. Therefore, it is likely that mdoG is constantly needed, which may be a reason to its constant expression in E. coli. Hence, both gene ontology overrepresentation and the function of the most stably expressed gene, mdoG, support the short generation time of E. coli.
Table 2

Gene ontology overrepresentation of the 39 invariant genes.

Primary functionGOIDGene ontology terms P value
Cell divisionGO:0071139Cell cycle5.51E − 61
GO:0006276Plasmid recombination6.01E − 31
GO:0016051Cell division2.90E − 16
GO:0006432Plasmid maintenance2.90E − 16
GO:0042594Response to starvation3.39E − 11
GO:0007049Chromosome segregation6.61E − 09
GO:0017038Resolution of recombinant intermediates1.21E − 08
GO:0006004Guanosine tetraphosphate metabolic process1.21E − 08
GO:0042953Lysogeny4.20E − 07
GO:0051301Diaminopimelate biosynthetic process3.50E − 06

Carbohydrate metabolismGO:0030259Carbohydrate biosynthetic process2.34E − 31
GO:0006749Lipid glycosylation2.34E − 31
GO:0007059Fucose metabolic process8.08E − 13

Protein synthesisGO:0030069Phenylalanyl-tRNA aminoacylation2.34E − 31
GO:0016075RNA catabolic process2.34E − 31
GO:0015969Lipoprotein transport2.90E − 16
GO:0019277Phenylalanyl-tRNA aminoacylation2.90E − 16
GO:0042150Protein import3.39E − 11
GO:0006396RNA processing2.48E − 09
GO:0033014Tetrapyrrole biosynthetic process1.21E − 08
GO:0019877Glutathione metabolic process4.20E − 07
Our results showed that none of the 7 housekeeping genes consistently appeared in the lowest 10% COV subset of each dataset (Table 3), while GAPDH [9], gyrA [8], and alaS [13] were found to be in the lowest 10% COV subset, in one dataset each. Our results illustrated that recA [8] has the highest weighted COV of 0.5378 and gyrA [8] has the lowest weighted COV of 0.1607, which is higher than that of mdoG (COV of 0.099). This suggests that commonly used housekeeping genes such as GAPDH [9] and recA [8] are not suitable for the expression profiling of E. coli. Hence, our results support our earlier hypothesis that common housekeeping genes found to be stable in one organism cannot be assumed to be stable in all organisms. This suggests the need to identify suitable reference genes for each organism of interest.
Table 3

Seven housekeeping genes and their mean COV values across 4 datasets.

Gene symbolGene nameWeighted COV values
recA Recombinase A 0.537752244
proC Pyrroline-5-carboxylate reductase 0.257211257
gyrA DNA gyrase 0.16070208
map Methionine aminopeptidase 0.282917613
rpoC RNA polymerase, beta prime subunit 0.273422333
alaS Alanyl-tRNA synthetase 0.160829965
GAPDH Glyceraldehyde-3-phosphate dehydrogenase 0.230513521
The advantage of COV is its capability to analyse as large number of samples as required [23] as the number of calculations increases proportionally to the sample size, resulting in linear complexity. NormFinder uses residual analysis between sample subgroup variation and the overall variation of the expression dataset to evaluate the variation contributed by each gene in the entire dataset [20]. Thus, the computational complexity of NormFinder increases exponentially as the number of samples increases; hence, it is only able to work with a small number of genes within reasonable time and computational resources. Therefore, we used Spearman's rank correlation coefficient to determine the correlation of stability index by NormFinder and COV values which showed that the sum of d 2 was 5664 and the P value was 0.006748. Since the P value was less than 0.01, the null hypothesis is rejected, indicating that there is correlation between the stability index from NormFinder and COV values but the strength of this correlation is difficult to establish as the significance in P value did not indicate the correlation strength. However, our results do not suggest that COV is a suitable replacement for NormFinder. As NormFinder [20] takes account of the overall variability in the entire dataset, it is likely to be statistically stronger than COV which is a normalized standard deviation. Given the advantageous ability of COV to process large amounts of data such as those derived from microarrays, it is plausible that COV can be used as a weaker filter for a broad category of genes with low expression variation, followed by stronger statistical analysis by NormFinder [20] to identify suitable reference genes.
  37 in total

Review 1.  Control genes in quantitative molecular biological techniques: the variability of invariance.

Authors:  S R Stürzenbaum; P Kille
Journal:  Comp Biochem Physiol B Biochem Mol Biol       Date:  2001-10       Impact factor: 2.231

2.  Genome evolution and adaptation in a long-term experiment with Escherichia coli.

Authors:  Jeffrey E Barrick; Dong Su Yu; Sung Ho Yoon; Haeyoung Jeong; Tae Kwang Oh; Dominique Schneider; Richard E Lenski; Jihyun F Kim
Journal:  Nature       Date:  2009-10-18       Impact factor: 49.962

Review 3.  Osmoregulated periplasmic glucans in Proteobacteria.

Authors:  J P Bohin
Journal:  FEMS Microbiol Lett       Date:  2000-05-01       Impact factor: 2.742

4.  Housekeeping gene selection for real-time RT-PCR normalization in potato during biotic and abiotic stress.

Authors:  Nathalie Nicot; Jean-François Hausman; Lucien Hoffmann; Danièle Evers
Journal:  J Exp Bot       Date:  2005-09-27       Impact factor: 6.992

5.  Effect of glutamine supplementation on protein metabolism and glutathione in tumor-bearing rats.

Authors:  S Yoshida; A Kaibara; K Yamasaki; N Ishibashi; T Noake; T Kakegawa
Journal:  JPEN J Parenter Enteral Nutr       Date:  1995 Nov-Dec       Impact factor: 4.016

6.  Parallel changes in gene expression after 20,000 generations of evolution in Escherichiacoli.

Authors:  Tim F Cooper; Daniel E Rozen; Richard E Lenski
Journal:  Proc Natl Acad Sci U S A       Date:  2003-01-21       Impact factor: 11.205

7.  Cell-division control in Escherichia coli: specific induction of the SOS function SfiA protein is sufficient to block septation.

Authors:  O Huisman; R D'Ari; S Gottesman
Journal:  Proc Natl Acad Sci U S A       Date:  1984-07       Impact factor: 11.205

8.  Selection of endogenous genes for gene expression studies in Eucalyptus under biotic (Puccinia psidii) and abiotic (acibenzolar-S-methyl) stresses using RT-qPCR.

Authors:  Leonardo P Boava; Marcelo L Laia; Tiago R Jacob; Karina M Dabbas; Janaína F Gonçalves; Jesus A Ferro; Maria It Ferro; Edson L Furtado
Journal:  BMC Res Notes       Date:  2010-02-24

9.  Structural analysis of variant and invariant genes in trypanosomes.

Authors:  N Agabian; L Thomashow; M Milhausen; K Stuart
Journal:  Am J Trop Med Hyg       Date:  1980-09       Impact factor: 2.345

10.  Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes.

Authors:  Jo Vandesompele; Katleen De Preter; Filip Pattyn; Bruce Poppe; Nadine Van Roy; Anne De Paepe; Frank Speleman
Journal:  Genome Biol       Date:  2002-06-18       Impact factor: 13.583

View more
  14 in total

Review 1.  Reference genes for measuring mRNA expression.

Authors:  Jitesh Dundas; Maurice Ling
Journal:  Theory Biosci       Date:  2012-05-17       Impact factor: 1.919

2.  Unexpected Diversity of Escherichia coli Sialate O-Acetyl Esterase NanS.

Authors:  Ariel Rangel; Susan M Steenbergen; Eric R Vimr
Journal:  J Bacteriol       Date:  2016-09-22       Impact factor: 3.490

3.  High-Resolution Mapping of the Escherichia coli Chromosome Reveals Positions of High and Low Transcription.

Authors:  Scott A Scholz; Rucheng Diao; Michael B Wolfe; Elayne M Fivenson; Xiaoxia Nina Lin; Peter L Freddolino
Journal:  Cell Syst       Date:  2019-03-20       Impact factor: 10.304

4.  Transcriptome Analysis of Spermophilus lateralis and Spermophilus tridecemlineatus Liver Does Not Suggest the Presence of Spermophilus-Liver-Specific Reference Genes.

Authors:  Bryan M H Keng; Oliver Y W Chan; Sean S J Heng; Maurice H T Ling
Journal:  ISRN Bioinform       Date:  2013-05-25

5.  Transcriptome Profiling Reveals Interplay of Multifaceted Stress Response in Escherichia coli on Exposure to Glutathione and Ciprofloxacin.

Authors:  Manish Goswami; Akkipeddi Venkat Satya Surya Narayana Rao
Journal:  mSystems       Date:  2018-02-13       Impact factor: 6.496

6.  Correlation and variation-based method for identifying reference genes from large datasets.

Authors:  Oliver Yuan Wei Chan; Bryan Ming Hsun Keng; Maurice Han Tong Ling
Journal:  Electron Physician       Date:  2014-02-01

7.  A predictor for predicting Escherichia coli transcriptome and the effects of gene perturbations.

Authors:  Maurice H T Ling; Chueh Loo Poh
Journal:  BMC Bioinformatics       Date:  2014-05-13       Impact factor: 3.169

8.  A lipoprotein/β-barrel complex monitors lipopolysaccharide integrity transducing information across the outer membrane.

Authors:  Anna Konovalova; Angela M Mitchell; Thomas J Silhavy
Journal:  Elife       Date:  2016-06-10       Impact factor: 8.140

9.  Microcystin-LR does not induce alterations to transcriptomic or metabolomic profiles of a model heterotrophic bacterium.

Authors:  Robbie M Martin; Stephen P Dearth; Gary R LeCleir; Shawn R Campagna; Elizabeth M Fozo; Erik R Zinser; Steven W Wilhelm
Journal:  PLoS One       Date:  2017-12-14       Impact factor: 3.240

10.  Endogenously generated 2-aminoacrylate inhibits motility in Salmonella enterica.

Authors:  Andrew J Borchert; Diana M Downs
Journal:  Sci Rep       Date:  2017-10-11       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.