Peijia Yuan1,2, Nadia G D'Lima1,2, Sarah A Slavoff1,2,3. 1. Department of Chemistry, Yale University , New Haven, Connecticut 06520, United States. 2. Chemical Biology Institute, Yale University , West Haven, Connecticut 06516, United States. 3. Department of Molecular Biophysics and Biochemistry, Yale University , New Haven, Connecticut 06529, United States.
Abstract
Recent advances in proteomics and genomics have enabled discovery of thousands of previously nonannotated small open reading frames (smORFs) in genomes across evolutionary space. Furthermore, quantitative mass spectrometry has recently been applied to analysis of regulated smORF expression. However, bottom-up proteomics has remained relatively insensitive to membrane proteins, suggesting they may have been underdetected in previous studies. In this report, we add biochemical membrane protein enrichment to our previously developed label-free quantitative proteomics protocol, revealing a never-before-identified heat shock protein in Escherichia coli K12. This putative smORF-encoded heat shock protein, GndA, is likely to be ∼36-55 amino acids in length and contains a predicted transmembrane helix. We validate heat shock-regulated expression of the gndA smORF and demonstrate that a GndA-GFP fusion protein cofractionates with the cell membrane. Quantitative membrane proteomics therefore has the ability to reveal nonannotated small proteins that may play roles in bacterial stress responses.
Recent advances in proteomics and genomics have enabled discovery of thousands of previously nonannotated small open reading frames (smORFs) in genomes across evolutionary space. Furthermore, quantitative mass spectrometry has recently been applied to analysis of regulated smORF expression. However, bottom-up proteomics has remained relatively insensitive to membrane proteins, suggesting they may have been underdetected in previous studies. In this report, we add biochemical membrane protein enrichment to our previously developed label-free quantitative proteomics protocol, revealing a never-before-identified heat shock protein in Escherichia coli K12. This putative smORF-encoded heat shock protein, GndA, is likely to be ∼36-55 amino acids in length and contains a predicted transmembrane helix. We validate heat shock-regulated expression of the gndA smORF and demonstrate that a GndA-GFP fusion protein cofractionates with the cell membrane. Quantitative membrane proteomics therefore has the ability to reveal nonannotated small proteins that may play roles in bacterial stress responses.
Despite their varied and often
essential functions, small proteins have been consistently underannotated
in both prokaryotic and eukaryotic genomes.[1] Small open reading frame (smORF)-encoded small proteins function
in bacteria as regulators of sporulation, cell division, membrane
transport, membrane-bound enzymes, protein kinases, and chaperones.[1−7] In a study of 51 recently discovered small Escherichia
coli proteins, 21 were upregulated under a specific
stress or growth condition.[8] Notably, 90%
of the small proteins that exhibited regulated expression were predicted
to contain single transmembrane helices.[8] It is therefore reasonable to hypothesize that additional small,
membrane-associated bacterial stress response proteins remain to be
discovered. Of the three leading technologies for smORF discovery,
computational genomics,[9] ribosome footprinting,[10,11] and liquid chromatography–tandem mass spectrometry proteomics
(LC–MS/MS),[12−15] LC–MS/MS has the advantage of direct detection of peptides
derived from nonannotated proteins and has recently been extended
to quantitative analysis.[16−19] However, bottom-up LC–MS/MS proteomics affords
relatively poor detection of membrane proteins due to their low abundance
and hydrophobicity,[20,21] suggesting that membrane-associated,
nonannotated small proteins may have been missed by previous quantitative
LC–MS/MS studies. To address this limitation, we present a
workflow for quantitative membrane proteomics. We apply this methodology
to the E. coli K12 heat shock response,
enabling the discovery of a previously nonannotated, membrane-associated
small heat shock protein, which we provisionally name GndA.We and others recently reported a label-free quantitation protocol
for comparative profiling of nonannotated peptides between two conditions.[16,19]Figure A provides
an overview of a membrane-focused quantitative proteomic workflow.
Briefly, E. coli K12 substr. MG1655
is grown under standard (control) conditions or subjected to heat
shock, then lysed. Cell membranes are pelleted via ultracentrifugation,
and the membrane proteome is resolubilized and separated on a peptide
gel.[15] Protein bands of low molecular weight
are excised and subjected to trypsin digest. The digest is then fractionated
by electrostatic repulsion hydrophilic interaction chromatography
(ERLIC), and the fractions are analyzed by LC–MS/MS. Subsequently,
the data are searched against a 6-frame translation of the E. coli K12 MG1655 genome using MASCOT, permitting
identification of both known and nonannotated peptides. Annotated
peptides are excluded with a string-matching algorithm[14] via comparison to the E. coli K12 MG1655 proteome. For semiquantitative, comparative analysis
of peptide abundance, sequences detected only in the heat shock sample
and not the control are identified, then MS1 extracted ion chromatograph
(EIC) peak intensities at the same retention time are compared.
Figure 1
Discovery of
small open reading frame (smORF)-encoded membrane
proteins through quantitative proteomics. (A) Overview of membrane-targeted
quantitative proteomic discovery protocol. (B) MS/MS spectrum corresponding
to an unannotated tryptic peptide fragment detected only in the heat
shock sample is shown. Identified y-ions and b-ions are shown in red
on the spectrum and indicated on the peptide sequence to which the
spectrum was matched. (C) Extracted ion chromatograms (EICs) comparing
peaks (shown in stick mode) corresponding to the peptide ion m/z value detected in (B) in heat shock
and control conditions at the same retention time. The same y-axis scale is used in both conditions. A viewing window
of 1 Da around the parent ion mass is used.
Discovery of
small open reading frame (smORF)-encoded membrane
proteins through quantitative proteomics. (A) Overview of membrane-targeted
quantitative proteomic discovery protocol. (B) MS/MS spectrum corresponding
to an unannotated tryptic peptide fragment detected only in the heat
shock sample is shown. Identified y-ions and b-ions are shown in red
on the spectrum and indicated on the peptide sequence to which the
spectrum was matched. (C) Extracted ion chromatograms (EICs) comparing
peaks (shown in stick mode) corresponding to the peptide ion m/z value detected in (B) in heat shock
and control conditions at the same retention time. The same y-axis scale is used in both conditions. A viewing window
of 1 Da around the parent ion mass is used.Prior to analysis of nonannotated sequences, we first validated
our workflow’s ability to quantify differential expression
of peptides from an annotated heat shock protein. Comparing the EICs
for selected tryptic fragments of the heat shock protein DnaJ (Hsp40)
and a known nonheat shock protein, 50S ribosomal subunit protein L6
(RplF), verified that the DnaJ peptide was detected only during heat
shock, while the RplF peptide was detected equally under both normal
growth and heat shock (Supporting Information (SI), Figure S1), as expected. Therefore, we reliably distinguished
heat shock responsive vs constitutive expression using label-free
quantitation.Second, we analyzed our workflow’s size
selectivity. To
do so, we first plotted the sizes of all annotated proteins identified
in both our heat shock and control samples that were subjected to
membrane enrichment. This analysis revealed a clear enrichment of
small proteins, with the most commonly detected protein sizes ranging
from 10 to 20 kDa (SI, Figure S2A), similar
to size distributions obtained for soluble proteins in past LC–MS/MS
proteomics studies of smORFs.[14−16]Finally, we confirmed that
we obtained an enrichment in peptides
derived from membrane proteins by comparing our membrane-enriched
control sample (not subjected to heat shock) to a previously reported
sample grown under similar conditions that was not subjected to membrane
preparation.[19] We compared all of the annotated
proteins identified using our membrane-enriched sample and the sample
without membrane enrichment against a list of all E.
coli K12 substr. MG1655 membrane proteins obtained
from EcoCyc. These searches showed that 412/1208, or 34%, of annotated
proteins detected from the membrane enrichment workflow had a membrane
localization annotation, as opposed to 488/1849, or 26%, of annotated
proteins detected from the regular workflow without enriching for
membrane proteins (SI, Figure S2B). Of
these proteins with membrane annotation, we detected peptides from
135 of them only in the workflow with membrane enrichment. These results
suggest that our workflow provides an enhancement in the detection
of peptides derived from membrane proteins while retaining small size
selectivity.The results of our proteomic analysis of heat shock
and control
samples are presented in SI, Proteomic results, and protein-level identifications are ranked according to sequence
coverage. Because we focused on molecular genetic validation rather
than statistical analysis of replicates to identify GndA as a heat
shock protein (vide infra), we note that only a single experimental
replicate is presented, so any other candidate heat shock-specific
peptides must be considered putative. Nevertheless, our data set may
aid hypothesis generation about regulated expression of predicted
proteins. For example, peptides mapping to four known or predicted
small proteins without currently annotated heat shock functions were
detected in the heat shock sample but not in the control sample (SI, Figure S3 and S4). Two of these proteins are
known or predicted to localize to the membrane (YfgG and YghG), and
three currently lack functional characterization. Further experiments
will be required to test heat shock responsive expression of these
proteins.In our heat shock data set, we identified precisely
one nonannotated
tryptic peptide exhibiting excellent sequence coverage (Figure B). Comparative analysis of
the extracted ion chromatogram for this nonannotated tryptic peptide
revealed MS1 ion intensity in our heat shock sample and not in the
control (Figure C).
This nonannotated peptide maps uniquely to an open reading frame (ORF)
that is contained entirely within the gene gnd in
the +2 reading frame (Figure ). The putative protein that would be produced by translation
of this ORF would therefore be completely different from Gnd at the
amino acid level. Because of its coencoding with gnd, we refer to the smORF as gndA. There are two in-frame
ATG codons upstream of the sequence putatively encoding the peptide
detected by LC–MS/MS, either of which could plausibly initiate
translation of GndA (Figure B). The length of GndA would thus most likely be 36–54
amino acids. Because bottom-up proteomics does not provide full sequence
coverage for this putative protein, we have not yet confirmed the
start codon or complete primary sequence for GndA, and it remains
possible that neither in-frame ATG codon is the correct start site
for this protein.
Figure 2
Location of the nonannotated gene, gndA, within
the E. coli MG1655 genome. (A) A gene
locus diagram shows the coordinate of the stop codon downstream of
a frame-shifted sequence within the annotated gnd gene. Sizes are proportional to gene lengths and directionality
of coding sequences is indicated with arrows. (B) The coding sequence
of gnd is shown with the sequence corresponding to
the tryptic peptide fragment detected by MS/MS bolded and underlined.
Highlighted in red are two upstream, in-frame candidate ATG start
codons.
Location of the nonannotated gene, gndA, within
the E. coli MG1655 genome. (A) A gene
locus diagram shows the coordinate of the stop codon downstream of
a frame-shifted sequence within the annotated gnd gene. Sizes are proportional to gene lengths and directionality
of coding sequences is indicated with arrows. (B) The coding sequence
of gnd is shown with the sequence corresponding to
the tryptic peptide fragment detected by MS/MS bolded and underlined.
Highlighted in red are two upstream, in-frame candidate ATG start
codons.Because we identified only a single
tryptic peptide that mapped
to GndA, rigorous molecular genetic confirmation of its expression
was required. We verified that gndA was expressed
and upregulated during heat shock by generating a chromosomally tagged
strain with the coding sequence for the tandem epitope tag SPA[8] integrated at the 3′ end of the predicted gndA smORF. We confirmed the site of SPA tag insertion via
integration check PCR and sequencing (SI, Figure S5). We grew the SPA-tagged strain under control and heat shock
conditions and specifically detected expression of an immunoreactive
band during heat shock (Figure ). (Many membrane proteins exhibit anomalous mobility in SDS-PAGE,[22] so the apparent migration of GndA-SPA may not
exactly correlate with its molecular weight.) This result is consistent
with expression of a small protein in the gndA reading
frame during heat shock.
Figure 3
gndA is expressed and upregulated
during heat
shock. (A) An E. coli MG1655 strain
was generated with the SPA epitope tag (followed by a kanamycin selection
marker, kan) introduced at the C-terminus of GndA. (B) Cell lysates
of SPA-tagged and wild-type E. coli MG1655 strains grown at 30 °C (control) and 45 °C (heat
shock) were separated on a 16% tricine gel and stained with Coomassie
blue (right). Western blotting was performed on the same samples using
anti-FLAG antibody to detect a portion of the SPA tag (left).
gndA is expressed and upregulated
during heat
shock. (A) An E. coli MG1655 strain
was generated with the SPA epitope tag (followed by a kanamycin selection
marker, kan) introduced at the C-terminus of GndA. (B) Cell lysates
of SPA-tagged and wild-type E. coli MG1655 strains grown at 30 °C (control) and 45 °C (heat
shock) were separated on a 16% tricine gel and stained with Coomassie
blue (right). Western blotting was performed on the same samples using
anti-FLAG antibody to detect a portion of the SPA tag (left).In the absence of a complete assignment of the gndA coding sequence, it remained possible that the observed peptide
was generated via an alternative mechanism, such as ribosomal frameshifting
during translation of 6-phosphogluconate dehydrogenase (Gnd), the
protein product of gnd. We therefore confirmed that
GndA can be translated independently. We generated pET21a plasmids
containing the genomic sequence comprising the annotated ATG start
codon of gnd to the stop codon of gndA. GFP was fused to the C-terminus of GndA to enhance stability and
enable immunoblotting. We also deleted the start codon of gnd from this construct. We observed that expression of
both of these constructs in BL21 cells produces the same product,
which migrates at a slightly higher apparent molecular weight than
GFP alone (SI, Figure S6). This result
is consistent with independent translation of GndA, although it does
not exclude all alternative interpretations.Bioinformatic and
biochemical analyses suggest that the predicted
primary sequence of GndA may correspond to a small transmembrane protein.
A portion of the putative GndA sequence (Figure A), highlighted in red, was predicted by
three programs (TMPred, Phobius,[23,24] and PredictProtein[25]) to form a transmembrane helix. Using the GFP
fusion construct employed in SI, Figure S6, we performed a membrane fractionation. We verified by Western blotting
that GndA-GFP is highly enriched in the membrane pellet after ultracentrifugation
as compared to total clarified lysate and the soluble fraction, consistent
with membrane localization (Figure C). A BLAST search against the NCBI nonredundant protein
database did not reveal significant homology between GndA and known
proteins (data not shown), and the predicted primary sequence of GndA
lacks a signal sequence. Therefore, determination of the full sequence,
function, mechanism of membrane insertion, inner vs outer membrane
localization, and orientation of GndA in the membrane will require
further study.
Figure 4
GndA is enriched in the membrane fraction. (A) The hypothetical
primary sequence of GndA contains a predicted transmembrane helix
(red). (B) BL21 cells were transformed with a pET21a plasmid encoding
a GndA-GFP fusion protein. (C) Cell lysates were fractionated, separated
on a 16% tricine gel, and stained with Coomassie blue as a loading
control (right). Western blotting was performed on the same samples,
probing using anti-GFP antibody (left). kDa, molecular weight ladder;
CL, clarified lysate; S, soluble fraction; M, membrane pellet; PL,
preclarified lysate.
GndA is enriched in the membrane fraction. (A) The hypothetical
primary sequence of GndA contains a predicted transmembrane helix
(red). (B) BL21 cells were transformed with a pET21a plasmid encoding
a GndA-GFP fusion protein. (C) Cell lysates were fractionated, separated
on a 16% tricine gel, and stained with Coomassie blue as a loading
control (right). Western blotting was performed on the same samples,
probing using anti-GFP antibody (left). kDa, molecular weight ladder;
CL, clarified lysate; S, soluble fraction; M, membrane pellet; PL,
preclarified lysate.In summary, we have developed an LC–MS/MS method to
detect
a peptide derived from a nonannotated small membrane protein regulated
by heat shock, GndA. Notably, gndA would have been
difficult to identify through alternative approaches to smORF discovery,
including bioinformatics and ribosome footprinting, because the frameshifted gndA coding sequence is completely contained within the
larger gnd sequence. Thus, our method presents a
complementary approach to new gene discovery. In the future, we anticipate
that this method can be extended to profiling of nonannotated membrane
proteins expressed under different stress conditions and in other
organisms.
Authors: Joseph Alexander Christie-Oleza; Juana Maria Piña-Villalonga; Rafael Bosch; Balbina Nogales; Jean Armengaud Journal: Mol Cell Proteomics Date: 2011-11-28 Impact factor: 5.911
Authors: Jiao Ma; Carl C Ward; Irwin Jungreis; Sarah A Slavoff; Adam G Schwaid; John Neveu; Bogdan A Budnik; Manolis Kellis; Alan Saghatelian Journal: J Proteome Res Date: 2014-02-14 Impact factor: 4.466
Authors: Sezen Meydan; James Marks; Dorota Klepacki; Virag Sharma; Pavel V Baranov; Andrew E Firth; Tōnu Margus; Amira Kefi; Nora Vázquez-Laslop; Alexander S Mankin Journal: Mol Cell Date: 2019-03-20 Impact factor: 17.970
Authors: Xiongwen Cao; Alexandra Khitun; Zhenkun Na; Daniel G Dumitrescu; Marcelina Kubica; Elizabeth Olatunji; Sarah A Slavoff Journal: J Proteome Res Date: 2020-06-03 Impact factor: 4.466