| Literature DB >> 26283093 |
Rima Chaudhuri1,2, Arash Sadrieh3, Nolan J Hoffman4,5, Benjamin L Parker6,7, Sean J Humphrey8, Jacqueline Stöckli9,10, Adam P Hill11, David E James12,13, Jean Yee Hwa Yang14.
Abstract
BACKGROUND: Most biological processes are influenced by protein post-translational modifications (PTMs). Identifying novel PTM sites in different organisms, including humans and model organisms, has expedited our understanding of key signal transduction mechanisms. However, with increasing availability of deep, quantitative datasets in diverse species, there is a growing need for tools to facilitate cross-species comparison of PTM data. This is particularly important because functionally important modification sites are more likely to be evolutionarily conserved; yet cross-species comparison of PTMs is difficult since they often lie in structurally disordered protein domains. Current tools that address this can only map known PTMs between species based on known orthologous phosphosites, and do not enable the cross-species mapping of newly identified modification sites. Here, we addressed this by developing a web-based software tool, PhosphOrtholog ( www.phosphortholog.com ) that accurately maps protein modification sites between different species. This facilitates the comparison of datasets derived from multiple species, and should be a valuable tool for the proteomics community.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26283093 PMCID: PMC4539857 DOI: 10.1186/s12864-015-1820-x
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Example input for PhosphOrtholog
| Record identifier | Site | Species code |
|---|---|---|
| QNGSNDS(0.001)DRYS(0.999)DNEEDSK | P42167_S184 | 0 |
| RAES(0.996)RT(0.193)S(0.811)VGS(1)QR | Q9BR39_S235 | 0 |
| S(0.002)ES(0.992)RT(0.005)S(0.001)LGSQR | Q2PS20_S228 | 1 |
| QNGSNDS(0.001)DRYS(0.999)DNDEDSKIELK | Q62733_S183 | 1 |
Users are required to input data in the format suggested in this table. Information for both species must be entered. The first column must be the record identifier and could be any unique identifier. In the example here, the unique identifiers are the peptide sequences for human and rat PTM sites. The second column represents the Uniprot ID of the species, modified residue type in one letter code and modification site position. An underscore sign must separate the Uniprot ID and site information for each species. The third column represents the species code, human: 0, rat: 1, mouse: 2 and fly: 3. If data for only one species is entered, PhosphOrtholog will return an error asking the user to input data in the correct format
Fig. 1User Interface Snapshots. a The instructions for generating the input data format, including each column description is described in “Step #1” in the PhosphOrtholog main page. The input interface also shows an example of the required data format in the table below the text “For example”. The data in the example table can be used as input by clicking the “Use above example” button. Mapping of this data can be completed by clicking “Map”. Input data can also be simply copy-pasted/edited/deleted on the user interface (UI) spreadsheet like with an Excel spreadsheet in the “Preview for input data set” table. Three separate example input files can also be downloaded through the ‘download’ links immediately below the example data table and uploaded to the UI through the “Upload” button. User provided datasets (in comma-delimited format) can be uploaded for mapping via the “Upload” button/copy-pasted into the preview input table or typed in. b Output Interface: Once mapping is ensued with the ‘Map’ button in “Step # 2”, the progress bar above the output table in “Step # 3” tracks the progress of the mapping function. This will give a rough estimate of how long the job will take to finish for large data sets. The first two columns in the mapped output table indicates the species 1 record identifier and PTM site details which is mapped to the orthologous species 2 site information shown in the third and fourth columns. The last column indicates the E-value significance score from the pairwise sequence alignement of the orthologous proteins. If the PTM site is a known mapped site from PhosphoSitePlus database, then this column reports “From PhosphoSitePlus” instead of a E-value. Once mapping is complete, this bar also reports the number of novel sites mapped by PhosphOrtholog, the percentage of novel sites that could be mapped in the data set and the percentage of known sites from PhosphoSitePlus that could be recovered by PhosphOrtholog
Fig. 2Software architechture. The four layers of the software implementation procedure and the communications between the layers are illustrated. The storage layer shows the six reference ortholog mapping databases where the species are abbreviated by their first letter, for e.g. human by H, rat by R, mouse by M and fly by F. The database of annotated PTM sites obtained from PhosphoSitePlus is represented as PSP
Fig. 3The algorithmic workflow. The schematic representation of the algorithm is depicted through the flowchart. The four stages through which the input data is analyzed to return the mapped sites are displayed
Fig. 4Role of PhosphOrtholog in the MS-based PTM data analysis pipeline. Illustration of the broadly divided four stages of MS-based PTM experiments, in Stage 1, sample extraction and preparation tasks are conducted from human and rat muscle tissues for the MS-based phosphoproteomics experiment. Stage 2 marks the raw spectral data analysis to generate peptide and protein annotations along with intensity measures for the PTMs induced by the experimental design in each species. In Stage 3, output from Stage 2 is parsed to extract information such as the leading Uniprot ID (‘Uniprot_ACC’), modified amino acid type (‘AminoAcid’) and modification site number (‘Site#’) from each species and concatenated in the desired input format for PhosphOrtholog mapping. We showcase the PTM examples for proteins ULK1 (2 sites in human and rat) and ACACA (3 sites in human and rat) here; column ‘ModificationSite’ indicates the peptide sequence with the identified PTM site and the probability of particular amino acids being phosphorylated by the number within the parenthesis. In Stage 4, the sites mapped by PhosphOrtholog are obtained, which are either annotated as newly mapped with a calculated E-value (4 out of 5 input sites were not mapped before, identified with E-value of 0) or with “From PhosphoSitePlus” if the mapping was previously known (mapping between human ACACA site S80 and rat ACACA site S79 is annotated in PhosphoSitePlus database)
Fig. 5Increased coverage of common sites. Shows the utility and efficiency of PhosphOrtholog compared to PhosphoSitePlus for three example datasets comprising human, rat and mouse phosphoproteomes. The coverage of conserved sites identified by PhosphOrtholog when compared to PhosphoSitePlus was increased by 136 % (from 83 annotated sites in PhosphoSitePlus to 196 mapped sites, an additional 113 novel orthologous PTM site matches) in dataset 1 and by 148 % (from 473 to 1174 mapped sites, an increase of 701 novel site matches) in dataset 2 and by 177 % (from 475 to 1315 sites, thereby adding 840 novel sites matches) in dataset 3