| Literature DB >> 31890553 |
Lili Niu1, Matthias Mann1,2.
Abstract
In this study, we faced the challenge of deciphering a protein that has been designed and expressed by E. coli in such a way that the amino acid sequence encodes two concatenated English sentences. The letters 'O' and 'U' in the sentence are both replaced by 'K' in the protein. The sequence cannot be found online and carried to-be-discovered modifications. With limited information in hand, to solve the challenge, we developed a workflow consisting of bottom-up proteomics, de novo sequencing and a bioinformatics pipeline for data processing and searching for frequently appearing words. We assembled a complete first question: "Have you ever wondered what the most fundamental limitations in life are?" and validated the result by sequence database search against a customized FASTA file. We also searched the spectra against an E. coli proteome database and found close to 600 endogenous, co-purified E. coli proteins and contaminants introduced during sample handling, which made the inference of the sentence very challenging. We conclude that E. coli can express English sentences, and that de novo sequencing combined with clever sequence database search strategies is a promising tool for the identification of uncharacterized proteins.Entities:
Keywords: De novo sequencing; Dictionary search; EuPA YPIC challenge; Mass-spectrometry; Proteomics
Year: 2019 PMID: 31890553 PMCID: PMC6924291 DOI: 10.1016/j.euprot.2019.07.010
Source DB: PubMed Journal: EuPA Open Proteom ISSN: 2212-9685
Fig. 1Analysis workflow.
Fig. 4Validation of found sentences by sequence database search.
a. Assembled peptide sequence, corresponding sentence and matched fragment ions of each sequence. b. All identified peptides from the sample. Peptides belonging to the sentence were highlighted in blue. c. All identified proteins from the sample, with the sentence, contaminants and co-purified proteins of E. coli color-coded. d. Summed raw intensity of the sentence, contaminants and co-purified proteins from E. coli.
Fig. 2MS/MS spectra of sequences generated on Orbitrap Fusion Lumos under ETD mode.
Fig. 3MS/MS spectra of sequences generated on Orbitrap Fusion Lumos under CID mode and on Q Exactive HF-X under HCD mode.