Literature DB >> 16845006

MODi: a powerful and convenient web server for identifying multiple post-translational peptide modifications from tandem mass spectra.

Sangtae Kim¹, Seungjin Na, Ji Woong Sim, Heejin Park, Jaeho Jeong, Hokeun Kim, Younghwan Seo, Jawon Seo, Kong-Joo Lee, Eunok Paek.

Abstract

MOD(i) (http://modi.uos.ac.kr/modi/) is a powerful and convenient web service that facilitates the interpretation of tandem mass spectra for identifying post-translational modifications (PTMs) in a peptide. It is powerful in that it can interpret a tandem mass spectrum even when hundreds of modification types are considered and the number of potential PTMs in a peptide is large, in contrast to most of the methods currently available for spectra interpretation that limit the number of PTM sites and types being used for PTM analysis. For example, using MOD(i), one can consider for analysis both the entire PTM list published on the unimod webpage (http://www.unimod.org) and user-defined PTMs simultaneously, and one can also identify multiple PTM sites in a spectrum. MOD(i) is convenient in that it can take various input file formats such as .mzXML, .dta, .pkl and .mgf files, and it is equipped with a graphical tool called MassPective developed to display MOD(i)'s output in a user-friendly manner and helps users understand MOD(i)'s output quickly. In addition, one can perform manual de novo sequencing using MassPective.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Peptides

Year: 2006 PMID： 16845006 PMCID： PMC1538808 DOI： 10.1093/nar/gkl245

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Identification of post-translational modifications (PTMs) is important to understand cellular functions of proteins (1). Sensitive methodologies based on conventional biochemical methods are lacking for the identification of PTMs in vivo, but recent advances in proteomic technology including mass spectrometry provide an approach to identify PTMs (2–5). Sequencing by tandem mass spectrometry (MS/MS) which has aided protein identification offers a tremendous potential for detecting PTMs (1). Three approaches have been used to automatically interpret tandem mass spectra for peptide sequencing (6), namely, database searching (7–9), de novo peptide sequencing (10–12) and sequence tag approach (13,14), and there have also been efforts to combine these methods (15,16). Interpretation of experimental spectra is harder if a peptide sequence contains PTMs. One might consider extending one of these approaches in a straightforward manner to sequence a peptide with any number, any kind and any combination of PTMs, i.e. developing a virtual peptide database by incorporating peptides with all possible combinations of PTMs, or extend the set of amino acids by introducing new virtual amino acids such that the mass of each virtual amino acid corresponds to the mass of a post-transitionally modified amino acid. However, such extensions yield an exponential time algorithm or a polynomial time algorithm, the degree of which is very high. Thus these algorithms are not appropriate to interpret tandem mass spectra with multiple PTMs in a reasonable amount of time. Recently there have been efforts to formally define this problem and suggest a way to reduce the time requirement of PTM identification (17,18). We have developed a convenient method called MOD for rapidly interpreting tandem mass spectra of peptides with multiple PTMs. This method adopts a hybrid approach that combines de novo sequencing with database searching. It performs well even when a large number and types (>100 modification types) of potential PTMs are considered. In addition, we developed a graphical tool called MassPective that shows MOD's output in a user-friendly manner. MassPective enables a user not only to quickly view and understand MOD's output, but also to perform additional manual de novo sequencing so that a MOD's interpretation can be manually inspected. By incorporating MOD and MassPective with web-based interface, we have produced MOD web service. MOD web will provide the biological community a fast and convenient vehicle for identifying PTMs from tandem mass spectra.

METHODS

MOD consists of five stages: peak selection, tag discovery, database search, tag chain generation and PTM identification. It assumes that the number of candidate proteins has already been reduced to 20 or less by protein identification, before the spectra set is analyzed for PTM identification. Peak selection: We select peaks with relatively high intensities (both globally and locally). The number of peaks selected is proportional to a parent ion mass. Tag discovery: We perform partial de novo sequencing on the selected peaks to identify all the tags (partial amino acid sequences that do not contain PTMs) of length up to 3. Database search: Using the tags identified, we search the peptide database for candidate peptides that contain any of the identified tags (called forward tags) or the reverse sequences of the identified tags (called reverse tags) of length at least 3. It should be noted that the peptide database we use does not contain any PTM information. Thus the scalability requirement is satisfied. Tag chain generation: For each candidate peptide, we build a tag chain. A tag chain for a candidate peptide consists of non-overlapping forward or reverse tags of length at least 2, occurring in the candidate peptide and in-between gaps, where each gap is a maximal consecutive amino acid subsequence of the candidate peptide that is not covered by any tags. The difference between the mass of a gap and the size of its aligned segment of the spectrum is called mass offset for the gap. (Figure 1)

Figure 1

A tag chain ‘__GG_glg_gga__’ for a spectrum from histone. A sequence of capital letters represents a forward tag, a sequence of small letters a reverse tag and each underline a gap. This tag chain consists of one forward tag ‘GG’, two reverse tags ‘glg’ and ‘gga’ and four gaps in between. A mass offset for a gap can be calculated by subtracting the mass of a gap from the size of its aligned segment of the spectrum. For example, the mass offset of the leftmost gap can be calculated as 41.947 Da, based on the size of aligned segment of 227.063(228.063 − 1) and the mass of the sequence ‘GK’ corresponding to the gap of 185.116 (Glycine: 57.021, Lysine: 128.095).

PTM identification: For each gap of a tag chain, we find a set of PTMs that best interprets the gap. We first enumerate candidate sets of PTMs that correspond to the mass offset of each gap, and then select the best candidate set by comparing the partial theoretical spectra generated by each candidate set with the partial experimental spectrum of the gap.

INPUT, OUTPUT AND PARAMETERS

Input

MOD requires users to input spectra, protein database and PTM database. Spectra. MOD can take several different formats of spectra: ISB mzXML format (*.xml), Thermo Finnigan dta format (possibly compressed to a *.zip format), Micromass pkl format (*.pkl) and Mascot mgf format (*.mgf). For reliable interpretation of PTMs, we recommend users to input spectra with mass error tolerance <1 Da. Protein database. Protein database should be in fasta format (*.fasta) and contain 20 or less protein sequences because a large protein database may produce bulky false positive results. PTM database. MOD can consider the entire unimod PTMs published in . In addition, users can configure their tailored PTM list composed of selected unimod PTMs and user-defined PTMs. It is possible to save and load user-selected PTM lists.

Parameters

Users can adjust MOD parameters appropriately according to one's experimental conditions. The parameters include maximum number of missed cleavages, mass tolerance, precursor mass tolerance and enzymes used. In addition, users can fine-tune MOD by fitting advanced parameters such as offset minimum/maximum value per gap, tag chain discard rate, minimum normalized intensity to consider, peak selection window size and minimum/maximum peaks in a window. Users can save parameters on a local host to re-use them for later analysis. A more detailed description of each parameter can be found in help pages of the MOD website.

Output

MOD outputs a unidta file (*.unidta) and a unidrawing file (*.unidrawing). To guarantee random access, each file has a tailing offset list which indicates file offsets of input spectra. The unidta file is a set of input spectra and the unidrawing file is an XML formatted file which contains interpretations of input spectra.

MassPective

MassPective has been developed to help users understand the interpretation from unidta and unidrawing files. MassPective is a graphical tool that displays MOD's output in a user-friendly manner and helps users understand MOD's output quickly. It shows each spectrum with ion-type annotation, candidate peptides for the spectrum, tag-chains for the candidate peptides and possible PTM interpretations for the tag-chains using graphical user interface. In addition, it enables users to perform additional manual de novo sequencing so that a MOD's partial interpretation can be manually annotated with additional sequencing information. Given a spectrum, MassPective displays MOD's output in three tiled windows (candidate peptide list, detailed information, and spectrum windows) and a pop-up gap list window (Figure 2). Candidate peptide list window shows candidate peptides for the current spectrum and tag-chains of each candidate peptide. By selecting a candidate peptide or a tag chain, a user can see useful information such as score, offset list or gap list in the detailed information window. If a user clicks a tag chain, a pop-up window showing gap list appears. Gap list window shows combinations of PTMs that may occur in each gap and their scores and a user can select an interpretation of each gap so that the corresponding annotation in the spectrum window can be displayed. Spectrum window displays the selected spectrum together with various annotations a user selects. By clicking buttons in the toolbar, a user can see spectral alignment of y ion tags, b ion tags, theoretical y-ion peaks and theoretical b-ion peaks respectively.

Figure 2

MassPective screen shot. This is the interpretation of a spectrum corresponding to the peptide ‘GKGGKGLGKGGAKR’ of histone. It shows spectral alignments of y ion tags, theoretical y-ion peaks, and their ion annotation in red and those of their b counterparts in blue.

Another function of MassPective is to report protein summary in CSV (Comma-Seperated Values) file format (Figure 3) or save spectral image in *.bmp or *.jpeg format and print out the currently displayed spectrum.

Figure 3

A protein summary report generated by MassPective. It is a summary of the output of MOD for each identified protein, containing protein information and sequence coverage for proteins which correspond to input spectra. Also, for each interpreted spectrum, it gives a submitted mass, matched peptide sites in protein sequence, match score, a spectrum identifier and identified PTMs.

MassPective also supports manual de novo sequencing with PTMs by identifying every amino acid (possibly with a PTM) of which mass corresponds to the mass difference of any two peaks in a designated spectral segment. (Figure 4)

Figure 4

Manual sequencing by MassPective. If a user designates an area in the spectrum, a window pops up showing all possible combinations of one amino acid mass possibly with one PTM whose mass agrees with the difference of two peaks in the area with higher intensity than the intensity value of the designated line.

RESULTS

MOD web server has been tested by four groups: a group that uses a nano-LC/MS–MS system consisting of an Ultimate HPLC system (LC Packings) for nano-LC and a Q-TOF Ultima Global mass spectrometer (Micromass) equipped with a nano-ESI source, a group that uses a Finnigan LTQ equipped with ‘in-house’ nano-ESI source, a group that uses Finnigan LCQ equipped with ‘in-house’ nano-LC system and a group that uses Applied Biosystems 4700 Proteomics Analyzer MALDI-TOF/TOF equipped with Agilent 1100 series capillary HPLC system. Figure 5 shows how MOD interprets a spectrum with multiple modifications successfully. MOD interprets the spectrum as a peptide ‘GKGGKGLGKGGAKR’ with four acetylation sites at every Lysine in the peptide. As shown in Figure 5, MOD first finds a tag chain __GG _glg_gga__, computes the mass offsets of the gaps, which are 41.95, 41.92, 41.94 and 41.93 Da, respectively, estimates four acetylation sites from the mass offsets and evaluates the estimation by a simple scoring scheme.

Figure 5

Interpretation for a spectrum with precursor ion 719.663 (2+) corresponding to a peptide ‘GKGGKGLGKGGAKR’ of histone. MOD first finds a tag chain __GG _glg_gga__, computes mass offsets of the gaps, which are 41.95, 41.92, 41.94 and 41.93 Da, respectively (shown in Figure 1). Every Lysine is interpreted with an acetylation and the estimation is evaluated by a scoring scheme.

MOD can find multiple PTMs in a single gap. In Figure 6, MOD identifies a deamidation on ‘N’ and a di-methylation on ‘K’ in a gap ‘NGK’. In general, considering various combinations of possible modifications requires prohibitively huge computational time as the numbers of possible modification sites and types increase. This example demonstrates how MOD successfully manages the time complexity of modification analysis problem so that it can spend time on identifying multiple closely located PTMs in a peptide. This is possible because MOD has already identified regions in a peptide that do not contain modified residues and are anchored on those sites before starting modification analysis for each gap. Thus, it narrowed down the search space for multiple PTMs to each gap region of a peptide, which is generally a lot shorter than the entire peptide length.

Figure 6

Shows that MOD can find multiple PTMs in a single gap. It identifies a deamidation on ‘N’ and a di-methylation on ‘K’ in a gap ‘NGK’.

DISCUSSION

Identifying multiple PTMs in a single gap mandates trying all possible combinations of feasible PTMs. This implies that as the number and the type of PTMs being considered grow, the number of combinations grows exponentially. This is the reason why most of the previous methods limit the number of PTM sites and PTM types being considered. However, we manage the computational complexity of PTM identification innovatively and therefore MOD can interpret a tandem mass spectrum when hundreds of modification types are considered and the number of potential PTMs in a peptide is large. MOD is different from existing methods based on sequence tags in that it aligns multiple sequence tags to a candidate peptide and isolates regions that might include post-translationally modified amino acids, while most of the previous methods align a single tag to a candidate peptide and try to identify modifications over the entire peptide. By first fixing unmodified regions in a peptide and then interpreting potentially modified amino acids in relatively small regions in between these unmodified tags, MOD can greatly reduce the search space for PTMs. In our experiments, MOD demonstrates its power by identifying uncommon modifications and artefacts such as di-methylation, acrylamide adduct (propionamide), cysteine oxidation to cysteic acid and tryptophan oxidation to formylkynurenin. Such results are in accordance with the recent publication by Pevzner group where a variety of PTMs are reported (18). Owing to the efficiency gain obtained by our method, MOD runs in a reasonable amount of time. We tested MOD on a web server of Intel Xeon 3.06 GHz dual processor with 2 GB memory, running Windows Server 2003. For a dataset of 5684 spectra, with 211 different PTM types downloaded from and nine protein sequences, obtained from Mascot's protein ID results, in the Protein DB, it took ∼360 s. When the same dataset was run with a database of 20 proteins (additional 11 random proteins from IPI human database), it took ∼1070 s. The current version of MOD assumes that the number of candidate proteins is limited to 20 or less, and focuses on finding PTMs in tandem mass spectra. Candidate proteins can be easily identified using tandem mass spectra that do not contain PTMs using any of existing methods such as SEQUEST, Mascot or X!Tandem (7,8,19). In order to conduct protein identification on the same web server, instead of using separate software tools, we are planning to generalize the current version of MOD so that it also includes the protein identification step.

19 in total

1. SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide database.

Authors: V Bafna; N Edwards
Journal: Bioinformatics Date: 2001 Impact factor: 6.937

2. A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry.

Authors: T Chen; M Y Kao; M Tepel; J Rush; G M Church
Journal: J Comput Biol Date: 2001 Impact factor: 1.479

3. High-throughput mass spectrometric discovery of protein post-translational modifications.

Authors: M R Wilkins; E Gasteiger; A A Gooley; B R Herbert; M P Molloy; P A Binz; K Ou; J C Sanchez; A Bairoch; K L Williams; D F Hochstrasser
Journal: J Mol Biol Date: 1999-06-11 Impact factor: 5.469

4. GutenTag: high-throughput sequence tagging via an empirically derived fragmentation model.

Authors: David L Tabb; Anita Saraf; John R Yates
Journal: Anal Chem Date: 2003-12-01 Impact factor: 6.986

5. A method for reducing the time required to match protein sequences with tandem mass spectra.

Authors: Robertson Craig; Ronald C Beavis
Journal: Rapid Commun Mass Spectrom Date: 2003 Impact factor: 2.419

6. PepNovo: de novo peptide sequencing via probabilistic network modeling.

Authors: Ari Frank; Pavel Pevzner
Journal: Anal Chem Date: 2005-02-15 Impact factor: 6.986

7. VEMS 3.0: algorithms and computational tools for tandem mass spectrometry based identification of post-translational modifications in proteins.

Authors: Rune Matthiesen; Morten Beck Trelle; Peter Højrup; Jakob Bunkenborg; Ole N Jensen
Journal: J Proteome Res Date: 2005 Nov-Dec Impact factor: 4.466

8. Identification of post-translational modifications by blind search of mass spectra.

Authors: Dekel Tsur; Stephen Tanner; Ebrahim Zandi; Vineet Bafna; Pavel A Pevzner
Journal: Nat Biotechnol Date: 2005-11-27 Impact factor: 54.908

9. Error-tolerant identification of peptides in sequence databases by peptide sequence tags.

Authors: M Mann; M Wilm
Journal: Anal Chem Date: 1994-12-15 Impact factor: 6.986

Review 10. A major step on the road to understanding a unique posttranslational modification and its role in a genetic disease.

Authors: Jacques U Baenziger
Journal: Cell Date: 2003-05-16 Impact factor: 41.582

24 in total

1. Iodination on tyrosine residues during oxidation with sodium periodate in solid phase extraction of N-linked glycopeptides.

Authors: Alejandro M Cohen; Ripsik Kostyleva; Kenneth A Chisholm; Devanand M Pinto
Journal: J Am Soc Mass Spectrom Date: 2011-10-18 Impact factor: 3.109

2. Mass spectrometry-based proteomics and peptidomics for biomarker discovery in neurodegenerative diseases.

Authors: Xin Wei; Lingjun Li
Journal: Int J Clin Exp Pathol Date: 2008-06-20

3. Proteomic Interrogation in Cancer Biomarker.

Authors: Un-Beom Kang
Journal: Adv Exp Med Biol Date: 2021 Impact factor: 2.622

4. A novel approach for untargeted post-translational modification identification using integer linear optimization and tandem mass spectrometry.

Authors: Richard C Baliban; Peter A DiMaggio; Mariana D Plazas-Mayorca; Nicolas L Young; Benjamin A Garcia; Christodoulos A Floudas
Journal: Mol Cell Proteomics Date: 2010-01-26 Impact factor: 5.911

5. A mixed integer linear optimization framework for the identification and quantification of targeted post-translational modifications of highly modified proteins using multiplexed electron transfer dissociation tandem mass spectrometry.

Authors: Peter A DiMaggio; Nicolas L Young; Richard C Baliban; Benjamin A Garcia; Christodoulos A Floudas
Journal: Mol Cell Proteomics Date: 2009-08-07 Impact factor: 5.911